Monday, November 24, 2014

How to import your iTunes library into R

If there's anything that 23andme, last.fm, Strava, or any of those countless facebook apps have shown us, it's that we love analyzing our own data and discovering new things about ourselves. A great source of data is your iTunes library. If you're anything like me, you listen to music constantly- at home, at work, or on the go. With iPods (and iPhones) having been popular for over a decade, iTunes could potentially have data on a significant portion of your life. Why not poke around it?

iTunes stores its library data in two separate files: iTunes Library.itl and iTunes Library.xmlAccording to Apple, the .itl file is the database iTunes uses for itself, but the .xml file is intended for use with external applications. The XML file is in a standard format easily readable by both humans and computers and is used all over the web for things like RSS feeds and web services.

With the use of two packages, XML and the ever-so-useful plyr, importing your iTunes library data into an R data frame is a dead-easy process.

Setup:

  1. Make a copy of your iTunes Library.xml or iTunes Music Library.xml file. (Safety first!)
  2. Install XML and plyr from CRAN.

Instructions:

Just follow along with these four easy steps:

  • ituneslib <- readKeyValueDB("iTunes Music Library.xml")
    This command loads your iTunes library into R as a .plist file, rather than a standard xml file. Remember to change "iTunes Music Library.xml" to whatever your file is named. Don't panic if it seems like the program's frozen! For my library (3500 songs on a 2007 iMac), this operation took about a minute and a half.
  • tracksxml <- ituneslib$Tracks
    This command grabs the "Tracks" section of the library and moves it into a separate variable for convenience. Tracksxml is currently a list of lists, with each list further wrapped inside a 1-item list, so now we have to restructure this into a more sane data format.
  • tracksxml1 <- lapply(tracksxml, data.frame)
    This command transforms the nested lists into data frames so that plyr can do its magic. (That is, the lists-inside-lists-inside-a-list are now data frames-inside-lists-inside-a-list). This command took me 25 seconds.
  • songs <- ldply(tracksxml1)
    And now plyr steps in and makes everything into a nice neat data frame!
After you're done, feel free to remove the tracksxml lists, they're no longer necessary as soon as you have your data frame. There is likely a faster way to do all of this, but this way is by far the simplest to type out and debug. For easy copy-pasting, here are the commands again:

ituneslib <- readKeyValueDB("iTunes Music Library.xml")
tracksxml <- ituneslib$Tracks
tracksxml1 <- lapply(tracksxml, data.frame)
ldply(tracksxml1)

Or you can do it all in one line:
tracks <- ldply(lapply(readKeyValueDB("iTunes Music Library.xml")$Tracks, data.frame))

Have fun and happy analyzing!

No comments:

Post a Comment