What can we learn from words that don’t exist?
Lately, I have been wondering what we can learn from what linguists call ‘word-formation patterns’. This term refers to the patterns by which speakers of a language create new words from old ones by applying morphological ‘formants’ such as affixes and suffixes. (For example, the rule that allows us to form abstract nouns with the suffix –hood enables us to generate, from the words sister and brother, the new words sisterhood and brotherhood.) The linguist Geoffrey Pullum shows this principle in action in the form of a table of parallel words, which contains both words that you can find in the dictionary and words that you cannot:
|candor||candify||candific||candid||candible (candor, 1634)|
|fervor||fervify||fervific||fervid||fervible (fervor, 1340)|
|horror||horrify||horrific||horrid||horrible (horror, 1382)|
|liquor||liquefy||liquific||liquid||liquible (liquor, c1225)|
|livor||livify||livific||livid||livable (livid, 1622)|
|lucor||lucify||lucific||lucid||lucible (lucid, 1591)|
|pallor||pallify||pallific||pallid||pallible (pallor, c1400)|
|rigor||rigify||rigific||rigid||rigible (rigor, 1398)|
|stupor||stupefy||stupific||stupid||stupible (stupor, 1390)|
|terror||terrify||terrific||terrid||terrible (terror, c1480)|
|torpor||torpify||torpific||torpid||torpible (torpor, 1607)|
|vigor||vigify||vigific||vigid||vigible (vigor, c1386)|
|tepor||tepify||tepific||tepid||tepible (tepid, c1400)|
What does this tell us?
What can we learn from a list of this kind? To begin with, we can examine the ratios of possible words to extant words. All of the words on this list might exist; they correspond to the morphological and phonological rules of the English language. (Some of the ‘nonexistent’ words might exist, depending on which dictionary you consult. Take the list mutatis mutandis.)
For some words, all or nearly all of the possible cognates exist. (I use the term cognate here simply to mean words that share a common root.) Terror aligns with the existing words terrify, terrible, and terrific (which used to mean ‘terrifying’); horror aligns with the existing words horrify, horrific, horrid, and horrible. For other words, few or none of the possible cognates exist. Pallor aligns only with pallid; vigor (or vigour) aligns with nothing. What is the difference between words whose possible cognates enter the English lexicon and words whose possible cognates do not?
I expanded the list with a column that shows the date on which the earliest version of each word entered English (see the rightmost column). This shows that cognates did not enter the English lexicon merely as a function of the passage of time; the oldest words on the table are not necessarily the words with the most extant cognates. (Terror, borrowed in the 15th century, has more extant cognates on the table than rigor/rigour, borrowed in the 14th century; vigor/vigour, one of the oldest words on the list, has no extant cognates on the table.)
Rather than the age of words, tables of this kind likely reflect on the frequency of word use. Over the history of the word candor/candour, for example, a few speakers have likely used, as a nonce word, the cognate candific. However, they did not do so frequently enough for the cognate to enter the lexicon. By contrast, over the history of the word horror, a large enough group of speakers has used the cognate horrific for the cognate to enter the lexicon.
Looking at the oldest words
This hypothesis corresponds with a common approach that linguists use to track the most frequently used words in a given language: examining the language’s oldest words. In English, the oldest words in the language include kinship words (mother, daughter, father, son), words for the body (hand, foot), and words for basic activities (eat, sleep). The theory is that speakers have used these words so regularly that no space was available for alternatives—that is, neologisms or borrowings from other languages—to intrude. (Indeed, the words mother and father, among others, appear to descend with relatively little change from the ancient language Proto-Indo-European.) The same phenomenon—regular usage keeping words alive on the lips of speakers—may account for the large number of extant cognates for a word like horrible; we simply say horrible, and words relating to it, more than we do candid.
This would suggest a window into frequency of usage that does not rely on computational corpus analysis. One limitation of computational analysis is that it requires written texts; the history of spoken language—a history written on water, with no hard record to use as input—is beyond its reach. Another limitation is that computational analysis often entails stemming words, or removing suffixes in order to focus on word roots. This can cause problems when we examine certain kinds of questions concerning word usage. So we find ourselves with the possibility of gaining new insights into the history of English from the pool of English words that do not exist. The possibility is vigific.