Rickman and Rudanko on Corpus Linguistics

In this article Paul Rickman and Juhani Rudanko, authors of Corpus-Based Studies on Non-Finite Complements in Recent English discuss the value of openly accessible corpora in conducting linguistic research.

It is not usually difficult to get language undergraduates interested in linguistic research using corpora. Many are immediately drawn to the fascinating combination of computer science, a bit of maths, and human language that is modern corpus linguistics. It might be described as using authentic examples of language usage as the empirical basis for helping to determine how language works, changes, and varies. Corpus linguistics is certainly nothing new – many nineteenth and early twentieth-century linguists are well known for producing important research that was carried out in this way, back then often using thousands upon thousands of paper slips of handwritten or typed tokens of language examples. Corpus linguistics in the modern era follows the same basic principles as it did in those early days, but the piles of paper slips have given way to electronic databases, and as a result it now involves larger and ever-increasing word counts, a far wider reach, and a great deal more efficiency.

The most widely-used online corpora, the family of corpora compiled and generously made available by Prof. Mark Davies and colleagues at Brigham Young University, offer a wide array of freely available linguistic databases, the largest of which contain billions of words. Several of the BYU corpora are regularly updated and continually expanding, so you can find examples of language use that appeared, for instance, in newspapers around the English-speaking world just yesterday. While it is probably true, due to the position of English in the world today, that the majority of corpora available right now represent the English language, it is most definitely not exclusively an English language phenomenon – BYU, for example, also offers large corpora in Spanish and Portuguese. Furthermore, freely available data retrieval and corpus design software make it possible to relatively easily create corpora of any language for individual research purposes.

So, the type of research we carry out in the linguistic field known as complementation is almost entirely dependent on these electronic resources, and the work we have included in our recent volume makes good use of some of the more commonly-used corpora: the Corpus of Contemporary American English, the Corpus of Historical American English, the British parliamentary corpus Hansard, and the aging but still relevant and very popular British National Corpus. One’s choice of corpora is essentially determined by the issues under investigation; in our case two clausal complement patterns, the to infinitive and the -ing clause, which, because they are similar enough in meaning, came to be used side by side in many contexts in the last few centuries, and, because a language often cannot accommodate too many structures that do roughly the same job, they began to compete with one another. The pair of sentences She was scared to admit her mistake / She was scared of admitting her mistake show the two clauses in use. Our research tracks the fortunes of these structures as they are used in certain key contexts over this period, and, we believe, succeeds in adding to our knowledge of this relatively underexplored area of language use. The Corpus of Historical American English is particularly valuable for this kind of diachronic work, as it comprises 400 million words of American English taken from a variety of genres, and it covers the last two centuries, the period 1810-2009.

The creation and availability of the BYU corpora has given a new perspective to work on complementation, in that it is now possible to trace and substantiate language change in a more verifiable way. In a very real sense, it is the compilers of these corpora that have made our book possible; they have provided the tools for the job, without which the job could not have been done.

Paul Rickman is a University Instructor of English at the University of Tampere, Finland. His research interests include complementation, New Zealand English, World Englishes and new-dialect formation. His recent work has addressed the issue of variation in the predicate complementation system of New Zealand English.

Juhani Rudanko is Emeritus Professor of English at the University of Tampere, Finland. His recent work has focused on the system of English predicate complementation in recent centuries and on the pragmatic analysis of political discourse in the early American Republic.

#SocSciMatters

Rickman and Rudanko on Corpus Linguistics

Stay informed

Palgrave Macmillan Humanities and Social Sciences group

Follow @PalgraveSoc

Sign up for our e-newsletter