Dolly Parton: a musical career expressed through language statistics
Here at Oxford Dictionaries we often refer to the Oxford English Corpus in our work. By consulting statistical analysis of a vast database of billions of words of English captured from the wild, our lexicographer colleagues can spot new words and usage patterns that they might not otherwise have encountered. It is a fascinating resource in which unexpected truths about the language can be discovered at every turn.
We can talk about corpus linguistics to our hearts’ content in these pages, but sadly it’s a little more difficult to show you in action. Full-size corpora are not suited to bite-size blog posts. We therefore sometimes need to pick smaller real-world examples of corpora to demonstrate some of the principles involved.
Today’s opportunity for a bit of statistical analysis of language comes from an unlikely source: 19 January is Dolly Parton’s birthday. For a long-time fan of country and western music this was too good an opportunity to miss!
Since the release of her first studio album Hello, I’m Dolly in 1967, she has released a constant stream of 42 albums, her work mirroring the changing fashions of her genre as well as the important events of the era. Our challenge was to distil some of that output through statistical analysis into a picture of a career in music.
Interrogating a corpus is not for the faint-hearted. We decided the story we wanted to tell could come from statistical analysis of the language used in each album to create a picture of its evolution over time. The result was a lot of research and rather laborious work in Excel to create a group of ten word lists for each five-year epoch from 1965 to 2015. Noise words (such as la) were removed, as well as words common to all epochs, and then the most interesting words in the top 20 were picked out in order of their frequency. The result is below; a deceptively simple-looking table for such a body of work.
Here we see Dolly’s career in words from early collaboration with Porter Wagoner, her massive country success in the 1970s and forays into the 1980s mainstream, to a return to her country and bluegrass roots from the 1990s onwards.
Words like mountain, mama, and daddy appear during the first decade due to her references to her rural upbringing in that period. You won’t be surprised to find that one of the high frequency words we removed from this period was Jolene; we did not want a particular song to skew an epoch over others, even a song as noteworthy as that one.
The 1980s open with a spike in festive words from her 1980 Christmas album collaboration with Kenny Rogers. It’s noticeable that the lyrics tend to contain shorter words when they are aimed at the pop charts; during her return to bluegrass at the very end of the 1990s and onwards, the word length increases significantly (not least the word bluegrass itself).
Words associated with American patriotism (soldier, America, marching) stand out in the early 2000s. This is a world event reflected in music; the 2001 World Trade Center attack generated an outpouring of music on that theme. Dolly’s 2003 album For God and Country was one of many in that period.
But perhaps most significantly, to some extent the tone and topic of her songs hasn’t changed all that much over the decades. At the beginning of her career we see she is singing about being hurt and tired; in recent years she’s sung about sacrifice and missing. As a couple of words at the bottom right of the table serendipitously put it, nothing changes. Dolly remains recognizably Dolly to her fans.
For a fan of Dolly Parton’s music this exercise brought a new perspective to a familiar theme. A simple description of the purpose of corpus analysis is “It tells you things you didn’t know about stuff you thought you knew very well“, and in this case it certainly didn’t disappoint.