After many months of talking and thinking about it, I finally wrote a greasemonkey script to annotate web pages with Rhaplinks. The script scans web pages looking for the names of musicians and when it finds them, links them to Rhapsody.com so you can listen to music by the named artist.
This simple idea is actually tricky to implement properly. Rhapsody has a lot of music and a lot of artists. So many that keeping the entire list in a javascript program is impractical, as is downloading the entire list from the server. So I took the most popular 50-100 artists in each primary genre and combined them into a single manageable list of about 1,000 names.
This idea is made practical by one of my favorite features of Rhapsody.com — human-writable URLs. Assuming your browser is set up properly (install plugin, enable pop-ups), opening http://play.rhapsody.com/Morcheeba causes Morcheeba to start playing. This API (can URL’s be API’s? I think so!) accepts punctuation too — http://play.rhapsody.com/R.E.M. will play R.E.M. And thanks to a generous interpretation of the HTTP spec by just about everybody, http://play.rhapsody.com/The Postal Service actually works too. (Note the technically illegal spaces in the URL.) What this means is that my script just needs a list of the names of the artists, and doesn’t need corresponding ID values to generate the playback URL. In fact, you can browse www.rhapsody.com for quite a while before ever seeing a database ID in your address bar. Which brings me to the interesting part of this post.
Computer Science Interlude
Javascript is a slow, interpreted language. The straightforward way to write this script would be to loop through a list of artist names, replacing each one in the document. Something like this:
var artists = ['The Postal Service', 'Morcheeba', 'Massive Attack', 'Madonna', 'Tosca', 'Underworld' ]; // The actual list is much
longer...
for(var i=0; i<artists.length; i++) {
document.body.innerHTML = document.body.innerHTML.replace( //... some regular expression
);
}
This script would run very slowly. To scan an HTML document with N characters for M artist names this way would take O(N*M) time. Instead I wrote the script in just 2 lines as follows:
var regex = /\b(The Postal Service|Morcheeba|Massive Attack|Madonna|Tosca|Underworld)\b/gi; // The actual list is much longer...
document.body.innerHTML= document.body.innerHTML.replace(regex,"<a href=\"http://play.rhapsody.com/$1\" title=\"Play $1 on Rhapsody\" >$1<img src='http://www.rhapsody.com/favicon.ico' alt=\"Play $1 on Rhapsody\"/></a>");
This might look like a cop-out — a cheezy easy way to do this. But it’s actually much faster. This will run in about O(N) time (assuming N>>M). The single giant regular expression looks for any of the artist-name-keywords and applies it to the whole HTML document at once. Firefox’s highly-optimized C++ regular expression engine compiles the big artist list into a single state-machine which is applied to the HTML much faster than anything I could possibly write in javascript. Regular expression interpreters are brilliantly efficient. Check out Jeffrey Friedl’s excellent Regular Expressions book if you want to know more about this highly practical topic. The result is that the script can parse a document for a large number of artist names in a totally tolerable amount of time. There’s a short delay when the page loads, but it’s still faster than browsing in IE.
Enough Theory. Let’s get down to practice!
The script isn’t perfect, but it’s pretty neat to use it to browse Myspace or Facebook and have a lot of the music people mention be instantly playable.
If you’d like to play with it, install Greasemonkey, and then install the Rhapsody Artist Linker script here.
[Update 5/4/07: a new and improved script is available here. Read about the changes.]