Monday, 03 March 2008

Rants

FreeDB: Broken and irritating


Being the good citizen, and believer in co-operative action that I am, on the (infrequent) occasion when I rip a CD, if FreeDB can't find it I always try to submit its information.  Problem is, the last few times I've sent one in, I’ve gotten this error message back:

Subject: Rejected freedb submission

Your freedb submission was rejected for the following reason:
Discid collision in category ····

Please fix the problem before you resubmit it.
The problem with this error is that the submitter has nothing to do with it:  Its cause is a limitation in the FreeDB protocol.  The result is a broken, user-unfriendly experience that discourages further submissions.

A look at the bloody details below the jump.


So what is this error, in words that non-geeks can understand? The FreeDB FAQ turns up this:
2.7. My freedb-submission was rejected with error-reason "Discid collision in category xy". What's the problem?

The disc ID, which is an important factor in identifying a CD, is not as good as it could be - in fact, it is pretty bad as a unique identifier for a CD.  Therefore completely different CDs (with the same length in seconds and the same track number) can have the same disc ID[1].  A [particular- o.g.] disc ID, however, can only exist once in each of the 11 categories.  This disc ID algorithm and the cddb protocol can unfortunately not be changed without losing backwards compatibility to existing applications - and we definitely want to keep backwards-compatibility.

If you receive a rejection e-mail telling you, that there was a disc ID collision in the category you tried to submit the entry to, that means, that in that category the disc ID of your CD is already used for a different CD.  When querying the server for disc information with your CD, you don't see the existing entry (which is for a different CD), because the server software checks not only the discid, but also the track offsets[2] when trying to determine, if an entry is a match or not.  In order to submit your CD to freedb successfully, you will have to select a different category - even if it does not really fit for the music style of your CD. Since a specific genre can be specified for a database entry in addition to the category, a wrong category choice is not really that problematic.
If I’m reading this correctly, they’re saying that because of some wrong decisions back when the cddb protocol was being dreamed up, different discs can generate the same “disc ID.”  But it appears that the “disc ID” isn’t what’s used to find the disc information FreeDB sends back to you[3], or, more exactly, it only gets used to take a stab at finding the information:  If a match for the “disc ID” is found, the server then checks the “track offsets” to be sure that it has matched the correct “disc ID.”

Anyway there are eleven categories: Blues - Classical - Country - Data - Folk - Jazz - Newage - Reggae - Rock - Soundtrack - Misc. When you  submit a disc, you’re supposed to select an appropriate category[4] for it. The catch is there can be
only one disc in each category with a particular “disc ID.”
only one disc in each category with a particular “disc ID.”

But now we’re getting collisions:  multiple discs with the same “disc ID” that really belong in the same category.  To fix this, FreeDB is asking its users (that is, the people who are volunteering their time to type in the data) to “select a different category - even if it does not really fit for the music style,” and resubmit the data.  Of course the user has no way of knowing if another collision might occur in the “different category” he selects:  In fact, it’s possible that
a particular submission might send the user through the “change the category and resubmit” loop another 9 times
a particular submission might send the user through the “change the category and resubmit” loop another 9 times-- assuming his patience holds out.  (And who knows what happens if there turn out to be collisions in all of the categories?)

Clearly, the “disc ID” as a primary (unique) key[5] for data lookup is broken, and [“disc ID”+category] is certainly broken from the standpoint of data entry.  And FreeDB acknowledges it (in the FAQ above):  “Since a specific genre [an additional data field -o.g.] can be specified for a database entry in addition to the category, a wrong category choice is not really that problematic.”

Well, if the wrong category choice is “not really that problematic,” then why is FreeDB bothering the users, that is, bothering the people volunteering their time to type in the data, with it? 
Why send  back a bogus “You failed, want to play again?” message?
Instead of sending back a bogus “You failed, want to play again?” message, FreeDB should silently find a category where there isn’t a conflict, and substitute that one.  In fact, why bother with categories at all anymore?  The “Genre” field sounds like what categories should have been.  How about having the user pick a genre, then let the server use some kind of a lookup list to pick a likely category, assigning the disc to that one if there’s no conflict.

Because while I’m willing to type in the data, I’m not prepared to waste my time doing it more than once, especially when the problem is FreeDB’s problem, not mine.  And I’ll bet a lot of users agree with me.

I wonder how many re-submissions they actually do get?




------
[1] Wikipedia's CDDB entry has an explanation for the method of generating the “disc ID.”

[2] The term
“track offsets” isn’t defined on the FreeDB site (aside from the FAQ quote above, all search matches are in the forums), but it appears to be the a list of the ending frame numbers for all tracks, in track order.

[3] Except possibly for some ancient applications that only transmit the “disc ID,” and don’t include the complete track offset information.

[4] What should be obvious to anyone who has ever had anything to do with organizing a record library is the insufficiency of these eleven categories for the purpose of dividing up the world’s music. While some are (relatively) specific (Blues, Folk, and Reggae) others (Jazz, Classical, and Rock) are hopelessly broad. Then there are the “missing” categories: Where do you put Frank Sinatra singing Cole Porter? Not Jazz and not Soundtrack! How about the Cincinnati Pops playing cowboy songs? (For that matter, how about Gene Autry singing cowboy songs?) “Country”? Don’t think so. And what about klezmer, or band music (with chorus, too!), or theater organ... not to mention all the world genres.
No, the category list has all the deficiencies of something put together by a techie (as indeed it was!), based on a personal record collection. Categories were probably one of those things to be fixed later, but now, in the name of “backward compatability,” it seems we’re stuck with the eleven existing ones.

[5] “A unique key must uniquely identify all possible rows that exist in a table...” see Wikipedia entry.

Posted by: Old Grouch in Rants at 23:58:45 GMT | No Comments | Add Comment
Post contains 1161 words, total size 10 kb.

Comments are disabled. Post is locked.
74kb generated in CPU 0.0104, elapsed 0.086 seconds.
50 queries taking 0.0795 seconds, 160 records returned.
Powered by Minx 1.1.6c-pink.