Corpus Linguistics and Legal Interpretation: A (Very Brief) Introduction


Rachel Martin ‘23
Staff Editor


There are many, sometimes overlapping theories of how judges should interpret laws. One is textualism, which favors applying the ordinary meaning of the law as written.[1] However, this position naturally raises the question of what that ordinary meaning is, which is itself often highly debatable. “I think we are . . . falling short on some of the promises of textualism,” Justice Thomas R. Lee of the Utah Supreme Court, a pioneer of corpus linguistics in judicial interpretation, once explained.[2] The problems are part theoretical—what is meant by “ordinary meaning”—and part operationalization—how to objectively determine that meaning.


One possible definition of ordinary meaning might be the meaning, or “sense,” which is most frequent in a given context.[3] However, this can be surprisingly complicated to ascertain. A judge’s (or any other person’s) intuition is not necessarily representative. And dictionaries are not much better, as they merely tabulate potential meanings. For example, in Muscarello v. United States, the defendant was charged with carrying a firearm during drug trafficking when he had a firearm in the locked glove compartment of his vehicle.[4] The defendant argued that “carry” only applied to having a firearm on one’s person, but the Supreme Court determined that the “primary” meaning of “carry” was to transport by vehicle because that is the sense of the word that comes first in the dictionary.[5] However, that sense was only first because the earliest recorded example of the word was used in this way.[6] “Somewhere in the thirteenth century, somebody who was speaking something that looks a little bit like the English that we speak today started using the verb ‘carries’ to describe somebody conveying something in a vehicle,” Justice Lee commented. “Of course, our languages evolve in ways that don’t have anything to do with etymology.”[7]


This is where corpus linguistics comes in. Put simply, a linguistic corpus is a large, often searchable body of texts taken from real-life use. While some are limited to narrow domains, such as the self-explanatory Wikipedia Corpus, others, such as the Corpus of Contemporary American English (COCA), are taken from a wide variety of spoken and written sources and are designed to be representative of overall language use.[8] By searching corpora, one can determine how frequent the given senses of a word are in different contexts. For example, if one wanted to analyze the issue in Muscarello, one could run a search for all instances of “carry” and its conjugations within a certain distance of words such as “firearm” or “gun.”[9] One would then go through the sentences the search returned and compare how frequent each sense of “carry” is. In the context of firearms, it turns out that it is overwhelmingly more common for “carry” to mean “have on one’s person” than “transport in a vehicle.”[10]

Pictured: Full of options, corpus linguistics presents a myriad of ways to address complex legal language and approaches. Photo Courtesy of thoughtco.com

Pictured: Full of options, corpus linguistics presents a myriad of ways to address complex legal language and approaches. Photo Courtesy of thoughtco.com

Of course, even if a most common meaning can be identified,[11] there may be valid reasons for using a different interpretation in any given instance. However, “textualists are trying to find out what the statute actually means . . . and the idea that we would ignore the well-developed systematic tools that have been developed in linguistics . . . is like trying to do antitrust law with no economics,” argues UVA Law Professor Lawrence Solum.[12] While corpora and other linguistic tools may not be a panacea for the issues of legal interpretation, they provide one way to introduce more transparency and methodological rigor into decision-making processes.

---

rdm9yn@virginia.edu


[1] Because languages change over time, this philosophy can be further split by whether original or contemporary ordinary meaning is prioritized. For a brief overview of this issue, see Thomas R. Lee & Stephen C. Mouritsen, Judging Ordinary Meaning, 127 Yale L.J. 788, 824-826 (2018). 

[2] Presentation by Justice Thomas R. Lee and UVA Law Professor Lawrence Solum to The Federalist Society at UVA Law (November 4, 2020) (recording accessible at https://www.youtube.com/watch?v=GetPdHEjSQQ).

[3] Another possible definition is a linguistic “prototype”—what first comes to mind when a word is mentioned.  For example, if you hear “bird” without additional context, you are more likely to picture something akin to a sparrow than a penguin or flamingo.  The “ordinary” bird you picture is a prototype.

[4] Muscarello v. United States, 524 U.S. 125, 127 (1998).

[5] Id. at 128.

[6] Dictionary ranking may be chronological, random, or based on dictionary makers’ intuitions. For a more detailed discussion of the misuse of dictionaries in legal interpretation, see generally Stephen C. Mouritsen, The Dictionary is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning, 2010 B.Y.U. L. Rev. 1915 (2010).

[7] Lee & Solum, supra note 2.

[8] See generally Mark Davies, https://www.english-corpora.org/pdf/english-corpora.pdf (last visited Jan. 24, 2021).

[9] Words that often occur together are called “collocates.” Collocates are useful for understanding how a word is used in practice, and most modern corpora allows an individual  to specifically search for or narrow results by them.

[10] Mouritsen, supra note 6, at 1964–65 (using COCA to find that in this context, approximately 64% of instances involved the “on person” sense, 1% involved the “transport” sense, 32% were ambiguous between the two, and 3% fell under neither category).

[11] Which will not always be the case.

[12] Lee & Solum, supra note 2.