Reply To: large databases


dmap is like mp4 atoms. it’s length encoded serialized xml. If, for example, and ituens client says “give a complete list of all song names, artist names, and album names”, then you have to build a dmap block for each song. That block looks something like this:

0c adbs (daap.databasesongs) -
05 mstt (dmap.status) - 00c8 (200)
01 muty (dmap.updatetype) - 00 (0)
05 mtco (dmap.specifiedtotalcount) - 1743 (5955)
05 mrco (dmap.returnedcount) - 1743 (5955)
0c mlcl (dmap.listing) -
0c mlit (dmap.listingitem) -
01 mikd (dmap.itemkind) - 02 (2)
09 asal (daap.songalbum) - Consolidated
09 asar (daap.songartist) - Consolidated
09 minm (dmap.itemname) - Consolidated (Adorno Strength Version)

See how there are containers there? The databasesongs container contains some status info, plus a container that has the song listing. The song listing is a container that holds multiple listingitems (which are also a container), which contains the info for each song.

Well and good. The crappy thing about this design is that the container also has info about exactly how many bytes long the container is. So the adbs container has the size of the 4 status elements + the size of the mlcl container. The mlcl container is the size of the sum of the mlit containers under it. The mlit container’s size is the sum of the sizes of the elements in it.

So before you even send the dmap stream, you have to know exactly how big it is. That’s why I’m saying it’s a two-pass deal… you have to know the size of the data you are sending before you send it.

Now. Here’s a thought experiment. Imagine the speed necessary to walk though that file system and calculate that stuff, extracting the data out, etc. Then decide whether it would be faster to just pre-compute that info and save it in a database. If it’s saved in a database, you get performance advantages from file cache localization, indexes, etc, etc.

In other words, keeping that same info in a database info is *bound* to be faster. But that’s the situation that exists right now!

So that can’t be the bottleneck. Or maybe it is… but if that’s the case, then changing to a “scan on demand and get tag info from file names” strategy would suffer from the same problems.

No, I think the bottle neck is somewhere else — I think the bottleneck is on some of the browse queries that the roku generates. The roku browses for songs by a particular artist, for example. Those queries look like “select distinct artist from songs” or “select distinct artist from songs where foo”. I’m guessing that by creating an index on the columns that the soundbridge queries on (artists, genres, composers, albums), then the result would be returned much quicker.

Another potential point of slowdown is on playlists. Static playlists particularly seem to be slow… they look something like:

select * from songs where id in (select distinct songid from playlistitems where playlist=)

Now I *know* that’s a painful query when there are lots of items in the playlistitems table.

Thing to do would be to run it with a logfile and “-d9 -Ddatabase” and watch the queries. Use the timestampt to figure out how long each query takes and see what indexes you need to speed it up.

I don’t have any problems, because my tables have only ~6k items. Not enough to be a performance problem!

I’ll probably have to build a bogus db for testing. 🙂