soap Next post: Love triangles and cliffhangers: soap operatic language

Pseudonym Previous Post: Call Me Maybe – a quiz of literary pseudonyms

Should ‘tweeps’ be in the dictionary?

twitter birds

“NO” and even, “NOOOOOO!” were some of the more emphatic reactions of many of Library Journal’s and Oxford University Press’s (OUP) Twitter followers who were recently posed with the question, “Should ‘tweeps’ be in the dictionary?” OUP asked the question ahead of the publisher’s June 18 webcast, hosted by Library Journal, which explored how social media affects “our view of dictionaries and the development of the English language.”

As reference editor for Library Journal, I already knew that librarians felt strongly about maintaining the quality of print dictionaries, but I didn’t quite understand that many users felt that the dictionary should catalog a kind of “ideal” English, instead of representing all the language that is in use. Many other thought-provoking ideas emerged from the approximately one-hour discussion that was moderated by Library Journal’s Josh Hadro.

Katherine Connor Martin, Oxford University Press
The first presenter was Katherine Connor Martin, Head of US Dictionaries, Oxford University Press. Martin explained that the media takes the inclusion of a word in the Oxford English Dictionary (OED) as “the ultimate seal of approval,” and leans toward the prescriptive view mentioned above. She explained that when, in March 2011, OUP added terms such as “OMG” and “LOL” to its flagship product (the OED), it became clear that the inclusion of language from texting and Twitter are viewed by some as a “sign of the linguistic apocalypse.” However, Martin was clear to note that OUP doesn’t see itself as a kind of language police; rather the publisher aims to compile a guide to words that the public will come across. (And by the way, the editor also noted that “OMG” was first used in a letter to Winston Churchill in 1917.)

‘Crowdsourcing’ the OED

Next Martin revealed how lexicographers discover new words. In the past, she explained, they would gather usage examples, documenting them on index cards. Over time, if they gathered enough such examples, the word would be deemed to have enough currency for inclusion in the dictionary. The first edition of the OED was based upon more than five million usage-illustrating “slips” submitted by volunteers—“we like to say that we were crowdsourcing before there was a word for it,” said Martin.

Things have changed. Nowadays, explained Martin, the work of human readers is supplemented by databases and search engines, using which “you can get millions of examples of a word in seconds.” The same technological changes, she went on to say, mean that Oxford’s two main online dictionaries can now be updated quarterly. Space considerations have also eased—they’ve almost disappeared—and “including more words just means providing more information to our readers, and it’s hard to see a downside to that.”

Martin then extrapolated to the most extreme repercussions of the trend toward expansion of dictionaries. Should anything be excluded? Online, she said, “there are some resources that programmatically aggregate lists of words, and others, famously Urban Dictionary, that allow users to add their own words.” The editorial purpose of a dictionary matters, too, with Martin explaining that, “the historical OED, which represents the history of usage over more than a thousand years, tends to wait a bit longer than our current dictionaries do to add new words.” As a permanent record of a word’s place in the language, the OED requires a word or meaning to demonstrate that it has achieved a level of general currency over a reasonable length of time before it is included.

Easier than A-Z

Then came a reference bombshell. “We have overcome the alphabet” claimed Martin, explaining that it was never an ideal way of arranging dictionary information, as “it works best if you know what you’re looking for.” Using advanced online searching, she explained, it’s possible to find answers to questions such as, “what words for foods came into English via French, or which slang terms first came into use in the 1930s.” Hyperlinks, she continued, integrate the functions of a dictionary and a thesaurus. These many changes, along with the greatly increased ability for Oxford to be in touch with dictionary users, have made it, Martin concluded, “an exciting time to be a lexicographer.”

Henrietta Thornton-Verma, Library Journal
Next it was my turn, and I addressed the phenomenon of crowdsourcing, admitting a bias up front: I think it’s a legitimate, desirable way of gathering data, as long as it’s only one part of a range of information-collection methods. “Just as there is a place in libraries for books of varying qualities and types,” I said, “a crowdsourced dictionary has a place in a spectrum of resources, some of which are extremely formal and reliable…[while others] are less reliable but also usually far more current.”

Currency is valuable, I explained, because people need definitions of all words that are in use, and the need is especially great for neologisms such as “tweeps.” Initially users can get by without a formal, historical definition of a word; rather they just need to understand the word when they hear it and be able incorporate it into their own lexicon. Crowdsourced dictionaries are also cheaper. “The challenge to us as librarians,” I went on, “is in informing the public which kind of resource is which, as we can’t, nor should we, I believe, try to make patrons quit using crowdsourced materials.”

Every language is crowdsourced

I continued by reminding the audience that every language is crowdsourced, so it seems obvious that dictionaries of those languages can be, too. While some countries try to impose rules and barriers on which words are allowed, those don’t work. As an example I cited the infamous French academy, the body that aims to regulate the French language. Even this illustrious body has now set up crowdsourcing of sorts. On the academy’s website, visitors are invited to exchange views on points of language and even campaign to “rehabilitate” French words fallen out of common usage.

I also cited Urban Dictionary as one example of where things are heading. A recent New York Times article was a case study of how such a resource can be used as a valuable complement to conventional dictionaries, the situation I endorsed in my introduction. Urban Dictionary is now used in the New York Court system, which often has occasion to look up slang such as “iron,” which means “handgun.”

Librarians chime in

Finally, I offered the opinions of several librarians; these were decidedly mixed. Marianne Orme, Des Plaines PL, IL, and a Library Journal reviewer, said that, “My basic view is that crowdsourcing is a great way to gather data. Still, detailed analysis of data is always needed, not just stepped-up versions of gathering it. The OED solicited data long ago, after all, so the idea of gathering data from the public is not new at all.” Neatly summing up the views against crowdsourcing, she commented, “I do not believe anyone is well-served by a dictionary version of voting for words as if on Survivor or American Idol.”

Gary Price, founder of Library Journal’s expressed concern about the upkeep of crowdsourced dictionaries, posing several questions: “Is there a plan in place to keep the dictionary updated over time with new words, changes to definitions, etc.? Will links to any outside sources be checked? What is the sustainability given that people tend to move on to new and cooler projects? If you’re building it for a group larger than those DIRECTLY involved, is there a plan to market and promote it?” And finally, “Who will determine what gets into the dictionary?”

The last librarian I quoted was Christina Connolly of Worcester Public Library in Massachusetts, whose scorn was withering. “Crowd-sourced dictionaries are an abomination to word nerds the world over,” she said, lamenting that, “Samuel Johnson is surely scratching furiously at the dirt ceiling of his grave.” She admitted that such resources can be a fun read, but urges the public to “Let the lexicographers and linguists among us keep their rightful place, creating dictionaries, shining healing light on our malapropisms, garbled grammar and bad, bad spelling.”

Ben Zimmer, Executive Producer, Visual Thesaurus and
Ben Zimmer, the final presenter, is, among many other things to do with words, the language columnist for The Boston Globe. Opening his portion of the webcast he agreed with Katherine Connor Martin that it is an exciting time, but admitted that “it’s a scary time as well.”

Electronic Lexicography

Zimmer also discussed Electronic Lexicography which concentrates on several “frontiers of development,” including corpus integration, offering a larger trove from which lexicographers can work. As an example, Zimmer showed an image from a “word toolkit” from the Oxford American Writer’s Thesaurus, which shows the kinds of words that co-occur with the word “mighty.” Drawing on the Oxford English Corpus, the image shows in large font the words that appear most often with mighty—“power” and “man,” for example—with a smaller font showing words such as “act” that are also used along with “mighty,” but less frequently. “It’s really online where this corpus-based data shines,” explained Zimmer, while noting that humans still have to do much of the work. While online tools can allow lexicographers to list an almost unlimited number of usages of a certain word, choosing the usages that best illustrate the meaning of a term is a task that still cannot be automated., Zimmer’s own product, came up too, illustrating novel uses of online capabilities in helping users learn new words. Lexicographers are no longer limited to providing simple definitions and example sentences, Zimmer was pleased to note, offering as an example a multiple-choice quiz from his website that asks users to identify the correct meaning of “delineate.”

Other more unusual possibilities are now available, too. Wordnik, as well as offering example sentences, provides images illustrating the word in question and tweets that use it. Online resources, Zimmer went on to say, also offer “much greater efficiency of access.” He showed how Wordnik and offer a kind of advanced auto-complete; as users type letters in the search box, they are offered a list of words that start with that letter combination, accompanied by their definitions. The choices offered are based upon what others look up as well as “what we think you’re looking for.” Zimmer maintains that these new kinds of possibilities, as well as the application of technology to previously print-only works such as The Dictionary of American Regional English (DARE), offer “new kinds of serendipity” rather than a loss of it, as some claim.

The webcast left much food for thought, and answered one important question: what will happen if “tweeps” is in fact added to the dictionary? The satisfying answer, provided by Katherine Connor Martin in the opening minutes of the presentation, is that it’s already there.