Response to a Letter to the Editor: Corpus Linguistics and Legal Interpretation, Part 2: An Imperfect Tool


Rachel Martin ‘23
Staff Editor

(The following was written as a reply to Professor John Setear’s Letter to the Editor, appearing February 3, 2021, responding to the author’s January 27, 2021 article “Corpus Linguistics and Legal Interpretation: A (Very Brief) Introduction.”)

 

Dear Professor Setear,

 

            Thank you for taking the time to engage with my article. I actually agree with a lot of what you said—as I mentioned at the end of my article, corpora and other linguistic tools are not going to be a panacea for all the woes of legal interpretation. Due to space constraints, I admittedly had to simplify things quite a bit and gloss over a lot of the complications and qualifications (hence the “(Very Brief)” in the title). I cannot give a full account of all said complications here, but I would like to take the opportunity you have provided to make a few more general comments.

            As a general matter, I think that anything purporting to be an interpretation of a written text should probably at least start with said text, even if in many or most cases the process cannot end there. (It often cannot end there because language is, after all, inherently ambiguous to some extent. However, I would say a finding of irreducible linguistic ambiguity in a given situation is itself valuable.) And inasmuch as one starts with the text, I think it better to take account of all the tools at one’s disposal and their strengths and flaws, rather than to blindly bow down before the divining rod of the dictionary and the whims and caprices of its compilers.  

            Corpora and other linguistic tools are merely that—tools. Corpora provide a way to compile and search naturally occurring language and were built to aid in the study and teaching of languages.[1] Tools such as corpora are neither inherently good nor bad, neither inherently conservative nor progressive. As one illustration, data from the Corpus of Founding Era American English (COFEA) has recently been used to argue that the phrase “bear arms” was, at the time of the founding, used overwhelmingly in a military or collective, not individual, sense.[2]

            As you rightly pointed out, choosing a relevant and representative database can be a thorny issue, and depends at least in part on what question you are asking. If one wants to know more about the terms of art used in the diamond trade, then a general, “balanced” corpus designed to be representative of the English language as a whole[3] would probably not be very helpful. However, even in specialized trades, much of the language used might be termed “ordinary”—e.g. “dog” in a veterinarian manual or animal control statute—and in these cases a general corpus may suffice. 

            Philosophical differences on what to prioritize also come into play. Languages change over time.[4] One may favor contemporary ordinary meaning on the principle that legal language should be understandable to the contemporary ordinary people whose behavior it is aimed at. Another may favor a variant of original meaning on the principle that judges should follow what was actually enacted by the voice of the people through the legislature and leave any changes to the same. What side of this debate one falls on affects what questions one asks and what corpus or parts of a corpus[5] are relevant to answering them. I do not intend to weigh in on the debate between originalism, living constitutionalism, and other such ‘-isms’, as my initial impressions[6] are that they all have some degree of merit and fault, and I do not pretend to have the wisdom or experience to proclaim some sort of ideal mix. However, whatever side one takes, I would encourage transparency about what one is doing and take care in avoiding the many methodological pitfalls that could lead to confirmation bias.[7]

            Of course, quantifying how common words are in a given context will only take one so far. A meaning that is less common is by definition used sometimes. Beyond doing frequency analyses using corpora, there are whole subfields of linguistics, such as syntax, semantics, and pragmatics, that relate to how words link together and have their meanings altered by their specific contexts. And even taking all of these subfields into account, one could never say with absolute certainty that there is some inherently “right” meaning in any given instance, especially with something like a statute that was written and approved by multiple people to be applied to multiple contexts. Language, like people, is messy, which is part of its beauty. Corpus linguistics is just one tool, empirically based but still imperfect, to test some of our assumptions about what “ordinary meaning” is, and I would not suggest that it should be the only or final word on who wins or loses in court.

 

Sincerely,

Rachel Martin

---

rdm9yn@virginia.edu


[1] While some recent corpora, such as COFEA, were at least inspired by potential legal applications, and others compile legal texts, these are recent developments and not the norm.

[2] See, e.g., Neal Goldfarb, Corpora and the Second Amendment: “bear arms” (part 1), plus a look at “the people,” LAWnLinguistics (Apr. 29, 2019).

[3] I would be remiss in my duty if I did not mention that how to properly balance general corpora is also a matter of debate.

[4] Consider the word “gay” or, as you pointed out, “federalist”.

[5] One can either choose between historical and contemporary corpora, or take a corpus like the Corpus of Historical American English (COHA), which spans from 1810–2009, and sort and compare results by decade.

[6] Being in only my third week of Constitutional Law.

[7] Such as searching for “firearm,” “carry,” and “vehicle,” if one wants to know whether “carry a firearm” more commonly means “transport by vehicle” or “have on one’s person.” See Stephen C. Mouritsen, The Dictionary is Not a Fortress: Definitional Fallacies and a Corpus-Based Approach to Plain Meaning, 2010 B.Y.U. L. Rev. 1915, 1957–58 (2010).