Death Row Inmates: Last Words

Word clouds aren’t the best of data visualizations. They’re often too simplistic, representing a small sample of words out of context. I felt, however, that a word cloud would be appropriate to convey the most frequent terms present in the last statements of death row inmates.

That’s because the majority of these statements is typically comprised of few sentences, where the inmates say goodbye to their families. Many apologise to their victims, some protest their innocence until the end. A few simply state they’re now ready to die. There’s not much variety here, so representing the top terms proportionally to their frequency will not, I think, be an inaccurate representation.

The following word cloud was generated after stop word removal of 518 last statements of Texas death row inmates, executed between 1982 and 2014. These statements were harvested from the Texas Department of Criminal Justice website. 

Here’s the top 15 words and their counts:

  • love: 634
  • family: 290
  • god: 203
  • life: 149
  • hope: 131
  • lord: 130
  • forgive: 127
  • people: 125
  • peace: 96
  • jesus: 96
  • give: 95
  • death: 92
  • pain: 81
  • strong: 81
  • warden: 77

It’s not at all suprising that “love”, “god” and “family” are at the very top. Here’s a sample of the most common bigrams and trigrams (ie, sets of two and three words):

  • “i love you”
  • “i would like”
  • “i am sorry”
  • “i am ready”
  • “i am going”
  • “thank you”
  • “my family”
  • “forgive me”
  • “stay strong”


Lexical Diversity: Black Metal vs. Pop Queens

Lexical diversity is typically defined as a measure of the uniqueness of the words used in a text, that is, the proportion of distinct words across the text. This type of measure is indicative of the vocabulary richness present in the text and general writing quality.
There are several ways of measuring this, but I’ll focus on MTLD (Measure of Textual Lexical Diversity) as it’s very sensitive and less prone to be affected by the text length, unlike more traditional metrics. MTLD, by the way, is defined as the mean length of sequential word strings in a text that maintain a given type to token ratio – in other words, sequences that have a high proportion of unique words.
To get an idea of how diverse is the vocabulary used in black metal, I’ve measured MTLD for the song lyrics of 18 bands, which were selected (more or less) randomly.
The text of each band consists of the entirety of their lyrics after removal of text portions in languages other than english. To make things slightly more interesting, I also computed the MTLD values for the three queens of pop (or maybe they’ve been dethroned since I last checked the current pop pulse, it’s been a while).
There’s a handful of caveats to the experiment I’m describing here, the most evident being 1) I’m assuming the traditional parameter values of MTLD are suitable for song lyrics, 2) I’ve removed all lyrics totally or partially written in languages other than English, and 3) there’s a lot of intentional line repetition in songs (choruses and the like), something that is more prevalent in pop music than black metal. With that in mind I removed such duplicates from both data sets, which actually improved (albeit not significantly) the pop artists MTLD value.
That said, it’s not at all surprising (to me, admitting my bias here) to see Dodheimsgard at the top, or that MTLD values for the three pop singers are a lot lower than for all of the selected black-metal bands. However, keep in mind that Lexical Diversity measures don’t explicitly take into account sentence structure or grammar, so we can’t really infer the degree of quality (for lack of a better expression) of how the words are used.
Band/Artist MTLD-MA Lexical Density (%)
Lady Gaga 56 53.39
Beyonce 56 51.30
Rihanna 58 52.89
Abigor 112 57.35
Blacklodge 91 59.7
Clandestine Blaze 86 63.15
Corpus Christii 94 57.48
Craft 73 51.15
Cultes des Ghoules 114 59.1
Darkthrone 104 57.65
Deathspell Omega 101 51.39
Dodheimsgard 132 58.17
Emperor 81 54.29
Immortal 73 58.94
Inquisition 72 58.02
Mayhem 83 59.7
Mutiilation 103 56.08
Ride for Revenge 102 58.7
Satanic Warmaster 70 55.47
Satyricon 79 54.04
Solefald 84 56.98

The rightmost column of the table above displays values of lexical density, which should not be confused with lexical diversity. The former is defined as the proportion of content words – such as nouns, adjectives and verbs – present in the text. Other categories of words are said to be functional (such as determiners). I’ll follow here a rough interpretation of Halliday‘s definition of lexical density and consider adverbs as content words.
As far as content word classes go, adjectives, nouns and prepositions (eg, “than”, “beyond”, “under”, “into”) are less common in the pop lyrics than in the bm lyrics analysed here. Usage of pronouns however (eg, “me”, “you”, “we”) is a lot more evident in the pop lyrics.
Focusing solely on the black metal lyrics, the same type of distribution is observed for each band, with the exception of nouns which for some reason are a lot more prevalent in Clandestine Blaze and Blacklodge lyrics, amounting to more than one third of all the words used. Another aspect where Blacklodge, along with Craft and Deathspell Omega, deviate considerably from the rest of the bands is the “Other” category.  This word class is actually the aggregation of the smaller classes, such as digits, punctuation and symbols. Craft, in particular, use a good amount of punctuation. Another interesting thing is seeing Deathspell Omega at the bottom of the table with regards to lexical diversity (ie, actual content words) values, albeit scoring high in the lexical diversity department.


  • MTLD was first suggested, and subsequently developed, by Philip McCarthy while @ the University of Memphis. I strongly suggest reading his and Scott Jarvis article “MTLD, vocd-D, and HD-D: A Validation Study of Sophisticated Approaches to Lexical Diversity Assessment” as it’s a lot more comprehensive than the very brief, limited and ad-hoc assessment I make here (and I’ve probably minsinterpreted some aspects of its correct usage).
  • All the MTLD values and part-of-speech tagging were computed in R, using the koRpus package implementation of MTLD-MA and TreeTagger.
  • D3 stacked bar chart source.

Around the World with Satan

The following map displays, for each country, the rank of the word “Satan” in black metal lyrics written between 1980 and 2013. This ranking is calculated as the ratio of the total number of times “Satan” occurs to the maximum raw frequency of any term in the country’s lyrics, after stopwords removal. The darker the shade of blue, the higher up in the term-ranking is “Satan” for that country. Filipino bands throw the S-word around a lot more than the rest of the world, at least in comparison with other frequent terms in their lexicon, followed by a number of countries in Latin America.

Click here for a larger version.

As for the (as in, “first”) most frequent word for each country, here’s a selected sample with some amusing entries:

  • Brunei: human
  • Kazakhstan: rape
  • Mongolia: soul
  • Costa Rica: lord
  • Honduras: cold
  • Barbados: ocean
  • Jamaica: pussy
  • Japan: hell

Note that actual size of each country’s corpus, that is, the total number of terms has some influence over the computed ratios. Since some countries are a lot more prolific, blackmetal-wise, than others, take this analysis with a grain of salt.