It encompasses information on bands, their albums, demos and other releases, and on record labels.
Most results described in this blog use lyrics that were written in English, originally, and automatic translations of those lyrics written in other languages. The automatic translations were carried using a Java wrapper for the Microsoft Translator API and a language detector.
The lyrics data set contains 81974 lyrics, of which 26% are automatic translations. And here’s a list of the posts where we explore this data set using topic modelling, entity recognition, keyword extraction and other text mining techniques: