By the way, these topic labels were chosen manually. Unfortunately I couldn’t devise an automated process that would do that for me (if anyone has an inkling on how to do this, let me know) so I had to pick meaningful and reasonable (I hope) representative titles for each set of words. In most cases, like the aforementioned “Coldness”, the concept behind the topic is evident. There are, however, a few cases where I had to be a bit more creative because the meaning of the topic is not so obvious (“Urban Horror” comes to mind).
As mentioned in part II of this post, one goal of this whole mess is to find out which lyrics “embody” a specific topic. Given that the lyrical content of a song is seen by the topic model as a mixture of topics, then we’re interested in discovering lyrics that are composed solely (or almost in their entirety, let’s say more than 90%) of a single topic. Using the topic inferencing capabilities of the Stanford Topic Model Tool I did just that, selecting at least 3 representative lyrics for 14 of the topics above. They’re displayed in the collapsible tree below.
For the most part the lyrics seem to have a high degree of correlation with the topic assigned to them: for instance Immortal’s “Mountains of Might” fits the “Coldness” topic fairly well (surprise, surprise…) and Vondur’s cover of an Elvis Presly song obviously falls into the heart stuff category. But there is one intriguing result: after reading Woods of Infinity’s “A Love Story”, I was expecting it to have the “Dreams & Stuff from the Heart” topic assigned to it. It falls in the “Fucking” topic instead, so maybe the algorithm detected something (
creepy) between the lines.
The zoomable treemap was built from Bill White’s Treemap with Title Headers.