Lexical diversity is typically defined as a measure of the uniqueness of the words used in a text, that is, the proportion of distinct words across the text. This type of measure is indicative of the vocabulary richness present in the text and general writing quality.
There are several ways of measuring this, but I’ll focus on MTLD (Measure of Textual Lexical Diversity) as it’s very sensitive and less prone to be affected by the text length, unlike more traditional metrics. MTLD, by the way, is defined as the mean length of sequential word strings in a text that maintain a given type to token ratio – in other words, sequences that have a high proportion of unique words.
To get an idea of how diverse is the vocabulary used in black metal, I’ve measured MTLD for the song lyrics of 18 bands, which were selected (more or less) randomly.
The text of each band consists of the entirety of their lyrics after removal of text portions in languages other than english. To make things slightly more interesting, I also computed the MTLD values for the three queens of pop (or maybe they’ve been dethroned since I last checked the current pop pulse, it’s been a while).
There’s a handful of caveats to the experiment I’m describing here, the most evident being 1) I’m assuming the traditional parameter values of MTLD are suitable for song lyrics, 2) I’ve removed all lyrics totally or partially written in languages other than English, and 3) there’s a lot of intentional line repetition in songs (choruses and the like), something that is more prevalent in pop music than black metal. With that in mind I removed such duplicates from both data sets, which actually improved (albeit not significantly) the pop artists MTLD value.
That said, it’s not at all surprising (to me, admitting my bias here) to see Dodheimsgard at the top, or that MTLD values for the three pop singers are a lot lower than for all of the selected black-metal bands. However, keep in mind that Lexical Diversity measures don’t explicitly take into account sentence structure or grammar, so we can’t really infer the degree of quality (for lack of a better expression) of how the words are used.
||Lexical Density (%)
|Cultes des Ghoules
|Ride for Revenge
The rightmost column of the table above displays values of lexical density, which should not be confused with lexical diversity. The former is defined as the proportion of content words –
such as nouns, adjectives and verbs – present in the text. Other categories of words are said to be functional (such as determiners). I’ll follow here a rough interpretation of Halliday
‘s definition of lexical density and consider adverbs as content words.
As far as content word classes go, adjectives, nouns and prepositions (eg, “than”, “beyond”, “under”, “into”) are less common in the pop lyrics than in the bm lyrics analysed here. Usage of pronouns however (eg, “me”, “you”, “we”) is a lot more evident in the pop lyrics.
Focusing solely on the black metal lyrics, the same type of distribution is observed for each band, with the exception of nouns which for some reason are a lot more prevalent in Clandestine Blaze and Blacklodge lyrics, amounting to more than one third of all the words used. Another aspect where Blacklodge, along with Craft and Deathspell Omega, deviate considerably from the rest of the bands is the “Other” category. This word class is actually the aggregation of the smaller classes, such as digits, punctuation and symbols. Craft, in particular, use a good amount of punctuation. Another interesting thing is seeing Deathspell Omega at the bottom of the table with regards to lexical diversity (ie, actual content words) values, albeit scoring high in the lexical diversity department.
- MTLD was first suggested, and subsequently developed, by Philip McCarthy while @ the University of Memphis. I strongly suggest reading his and Scott Jarvis article “MTLD, vocd-D, and HD-D: A Validation Study of Sophisticated Approaches to Lexical Diversity Assessment” as it’s a lot more comprehensive than the very brief, limited and ad-hoc assessment I make here (and I’ve probably minsinterpreted some aspects of its correct usage).
- All the MTLD values and part-of-speech tagging were computed in R, using the koRpus package implementation of MTLD-MA and TreeTagger.
- D3 stacked bar chart source.