Death Row Inmates: Last Words

Word clouds aren’t the best of data visualizations. They’re often too simplistic, representing a small sample of words out of context. I felt, however, that a word cloud would be appropriate to convey the most frequent terms present in the last statements of death row inmates.

That’s because the majority of these statements is typically comprised of few sentences, where the inmates say goodbye to their families. Many apologise to their victims, some protest their innocence until the end. A few simply state they’re now ready to die. There’s not much variety here, so representing the top terms proportionally to their frequency will not, I think, be an inaccurate representation.

The following word cloud was generated after stop word removal of 518 last statements of Texas death row inmates, executed between 1982 and 2014. These statements were harvested from the Texas Department of Criminal Justice website. 

Here’s the top 15 words and their counts:

  • love: 634
  • family: 290
  • god: 203
  • life: 149
  • hope: 131
  • lord: 130
  • forgive: 127
  • people: 125
  • peace: 96
  • jesus: 96
  • give: 95
  • death: 92
  • pain: 81
  • strong: 81
  • warden: 77

It’s not at all suprising that “love”, “god” and “family” are at the very top. Here’s a sample of the most common bigrams and trigrams (ie, sets of two and three words):

  • “i love you”
  • “i would like”
  • “i am sorry”
  • “i am ready”
  • “i am going”
  • “thank you”
  • “my family”
  • “forgive me”
  • “stay strong”


Using NLP to build a Black Metal Vocabulary

Black metal is typically linked, since its inception, to Satanic or anti-Christian themes. With the proliferation of bands in the 90s (after the Norwegian boom) and subsequent emergence of sub-genres, other topics such as paganism, metaphysics, depression and even nationalism came to the fore.

In order to discover the terminology used to explore these lyrical themes, I’ve devised a couple of term extraction experiments using the black metal data set. The goal here is to build a black metal vocabulary by discovering salient words and expressions, that is terms that when used in BM lyrics carry more information than when used in a “normal” setting. For instance, the terms “Nazarene” or “Wotan” have a much higher weight in the black metal domain than  in the general purpose corpus used for comparison. Once again note that this does not necessarily mean that these two words occur very frequently in BM lyrics (I’d bet that “Satan” or “death” have a higher number of occurrences), but it indicates that, when they do, they carry more information within the BM context.

This task was carried through JATE‘s implementations of the GlossEx and C-value algorithms. The part-of-speech of each term (that is, the “type” of term) was discovered with the StanfordNLP toolkit. The top 50 of each type (with the exception of adverbs) are listed in the table below. For the sake of visualization, I make a distinction between named entities/locations and the other nouns, being that the former are depicted in the word maps at the end of this post.

I’ve also included, in the last column of the table, the top term combinations. It’s noteworthy how much of these combinations are either negations of something (“no hope”, “no god”, “no life” and so on), or concerned with time (“eternal darkness”, “ancient times”). Such preoccupation with large extensions of “time” is also evident in the top adverbs (“eternally”, “forever”, “evermore”),  adjectives (“endless”, “eternal”) and even nouns (“aeon” or “eon”).

Endless Nevermore Desecrate Forefather Life and death
Unhallowed Eternally Smolder Armor Human race
Luciferian Tomorrow Travel Aeon No light
Infernal Infernally Fuel Splendor No hope
Necromantic Forever Spiral Pentagram Eternal night
Paralyzed Anymore Dethrone Perdition No god
Pestilent Mighty Throne Specter Full moon
Unholy Skyward Envenom Misanthrope No life
Illusive Evermore Lay Cross Black metal
Untrodden Earthward Resound Magick Cold wind
Astral Someday Mesmerize Nihil No place
Misanthropic Astray Abominate Ragnarok No escape
Unmerciful Onward Paralyze Blasphemer No return
Cruelest Verily Blaspheme Profanation Eternal life
Blackest Deathly Impale Misanthropy No fear
Eternal Forth Cremate Malediction Flesh and blood
Wintry Unceasingly Bleed Revenant No matter
Bestial Weightlessly Procreate Damnation Fallen angel
Reborn Anew Enslave Conjuration Eternal darkness
Putrid Demonically Awake Undead No man
Darkest Behold Nothingness Dark night
Unblessed Intoxicate Armageddon Lost soul
Colorless Devour Lacerate No end
Diabolic Bury Wormhole Ancient time
Demonic Demonize Eon No remorse
Wrathful Forsake Devourer No reason
Nebular Enshroud Impaler No longer
Vampiric Writhe Sulfur Black cloud
Unchained Destroy Betrayer Dark forest
Armored Entomb Deceiver Human flesh
Immortal Raze Bloodlust Endless night
Hellish Flagellate Reaper Ancient god
Hellbound Unleash Horde Mother earth
Unnamable Convoke Blasphemy Black wing
Prideful Crucify Eternity Night sky
Colorful Fornicate Defiler Dark side
Unbaptized Torment Immolation Eternal sleep
Unforgotten Venerate Soul Black hole
Satanic Beckon Abomination Black heart
Morbid Defile Flame Flesh and bone
Sempiternal Distill Hail No chance
Mortal Immolate Malignancy Dark cloud
Honorable Welter Wrath Final battle
Glooming Run Pestilence Eternal fire
Willful Sanctify Gallow No peace
Lustful Eviscerate Disbeliever No future
Everlasting Unchain Witchery Black soul
Impure Ravage Satanist Final breath
Promethean Mutilate Lust Black night

Most salient entities: many are drawn from the Sumerian and Nordic mythologies. I’ve also included in this bunch groups of animals (“Beasts”, “Locusts”).

Most salient locations. I’ve also included in this bunch non-descript places (“Northland”). Notice how most are concerned with the afterlife (surprisingly, “hell” is not one of them).

It occurred to me that these results could be the starting point of an automatic lyric generator (like the now defunct Scandinavian Black Metal Lyric Generator). Could be a fun project, if time allows (probably not).


IBM GlossEx

Jason Davies’ D3 Word Cloud

JATE – Java Automatic Text Extraction

StanfordNLP Core

PartI – Topic Discovery in Black Metal Lyrics (French Bands)

Counting occurrences of single words is not the most informative way of discovering the meaning (or a possible meaning) of a text. This is mainly because both the relationship between words and the context in which they occur are ignored. A more significant result would be discovering sets of correlated terms that express ideas or concepts underlying the text. Topic modeling addresses this issue of topic discovery, and more importantly, does so with (almost) no human supervision.

‘Topic’ is defined here as a set of words that frequently occur together. Quoting from Mallet: “using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings”This is especially important in a data set like the black metal lyrics given that there are a number of words (such as death, life and blood) that appear in different contexts.

So, how does a topic modeling tool work? According to this excellent introduction to this subject, its main advantage is that it doesn’t need to know anything about the text except the words that are in it.  It assumes that any piece of text is composed of words taken from “baskets” of words, where each basket corresponds to a topic. Then it becomes possible to mathematically decompose a text into the probable baskets from whence the words first came. The tool goes through this process repeatedly until it settles on the most likely distribution of words into topics.

What results from this approach is that a piece of text is seen as a mixture of different topics, being that each topic has a weight associated. The higher the weight of a topic, the more important it is to characterize the text. For sake of a practical example, let’s say that we have  the following topics

  1. cosmic, space, universe, millions, stars …
  2. dna, genetic, evolution, millions, years ….
  3. business, financial, millions, funding, budget ….

Notice how the word “millions” shows up in different contexts: you can have a business text talking about millions of dollars or a science text mentioning evolution over millions of years. Taking the following text as a test case for our simple example topic model…

“The Hubble Space Telescope (HST) is a space telescope that was carried into orbit by a Space Shuttle in 1990. Hubble’s Deep Field has recorded some of the most detailed visible-light images ever, allowing a deep view into space and time. Many Hubble observations have led to breakthroughs in astrophysics, such as accurately determining the rate of expansion of the universe […]ESA agreed to provide funding and supply one of the first generation instruments for the telescope […] From its original total cost estimate of about US$400 million, the telescope had by now cost over $2.5 billion to construct.

…it does seems reasonable that it can be seen as a mixture of topics 1 and 3 (with topic 1 having a higher weight than topic 3):

What would a black metal topic model look like? To find out, I’ve made a couple of preliminary experiments using lyrics from French black metal bands (future experiments will explore other subsets of the lyrics corpus, and hopefully build a topic model for the entire data set, if time allows). The model described in this post was generated with Mallet, setting the number of topics to look for to 20, and using its most basic processing techniques: stop word removal, non-alphanumeric characters removal, feature sequences with bigrams, and little else.

For reasons of economy (and also not to bore you to tears) I’ll just list the top 10, that is, the 10 topics that have a higher “weight” in characterizing the French lyrics subset (the remaining 10 have very small weights). Each is represented by 9 terms:

  1. life, time, death, eyes, pain, soul, feel, mind, world
  2. night, dark, light, cold, black, darkness, moon, sky, eternal
  3. world, life, human, death, earth, humanity, end, hatred, chaos
  4. blood, body, black, tears, flesh, heart, eyes, love, wind
  5. satan, blood, black, god, hell, evil, lord, christ, master
  6. war, blood, death, fight, fire, black, kill, rise, hell
  7. land, gods, blood, people, proud, men, great, king, ancestors
  8. god, time, void, light, death, reality, stars, matter, infinite
  9. god, lord, fire, divine, holy, light, flesh, man, great
  10. fucking, shit, fuck, make, time, trust, love, suck, dirty

The first one seems to be all over the place: life, time and death can be applied to a ton of subjects, and indeed they seem to characterize to some extent about half of the data set. Also, some terms appear quite often in different contexts (blood, black, death and even god). But there are a couple of interesting ones, such as topics 2, 7 and 10. And because looking at lists of words is tedious, here’s a word cloud that represents them using a horrid sexy color palette. Each topic has a different color, and the larger the font, the more preponderant it is in the subset.

One practical application of a topic model is using it to describe a text. Let’s take, for example, the lyrics for Blut Aus Nord’s “Fathers of the Icy Age” and “ask” our topic model what’s the composition of this particular piece of text. The outcome is:

  • Topic 7 (54.25%): land, gods, blood, people, proud, men, great, king, ancestors
  • Topic 2 (42.34%):  night, dark, light, cold, black, darkness, moon, sky, eternal
  • Other topics – less than 3.41%

We can interpret this song as a mixture of two topics, and in my opinion, the first one (let’s call it “ancient pagan stuff of yore”) seems to be pretty accurate. What about more personal lyrics such as T.A.o.S. “For Psychiatry”? Here’s what we get:

  • Topic 10 (40.95%): fucking, shit, fuck, make, time, trust, love, suck, dirty
  • Topic 1 (26.13%): life, time, death, eyes, pain, soul, feel, mind, world
  • Topic 3 (12.05%): world, life, human, death, earth, humanity, end, hatred, chaos

It’s a bit too generic for my liking, but we’re not that far off the mark. All in all, topic modeling appears to be quite useful for the discovery of concepts in our data set. There are, however, a few drawbacks to this approach. One of them is that the number of topics has to be set manually – in an ideal case the algorithm should figure out by itself the appropriate number. The other is the simplicity of the features, future experiments should focus on improving the lyrics representation with richer features. At any rate, these are promising results that can be further improved.