Building a corpus to represent a variety of a language

Brian Clancy*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

2 Citations (Scopus)

Abstract

In order to build a corpus to represent a variety of a language, we must first acknowledge the complexity of what is comprehended in the term language varieties. This chapter reviews the concept of language variety and applies its ubiquity of reference to the practical challenges of designing and building corpora. To this end, the fundamental distinction between user-related and use-related varieties is explored. The choice to build a use- or user-related corpus (or a blend of both) has an obvious impact on a corpus designer’s decision-making processes, in particular in relation to core concerns such as size, representativeness and balance. This chapter explores and discusses these concerns, invoking both well-established, more traditional corpora such as BNC1994 or the ICE suite of corpora and more modern corpora such as BNC2014, the GLOWBE project or the ENTENTEN family of corpora. The benefits of designing and building corpora which capture multiple varieties of a language – both use- and user-based – are illustrated using the 1-million-word Limerick Corpus of Irish English to explore pragmatic characteristics of Irish English. This approach complements (relatively) recent disciplinary paradigms such as variational pragmatics and the strongly emergent field of corpus pragmatics.

Original languageEnglish
Title of host publicationThe Routledge Handbook of Corpus Linguistics, Second edition
EditorsAnne O'Keeffe, Michael McCarthy
PublisherTaylor and Francis
Pages62-74
ISBN (Electronic)9780429634130
ISBN (Print)9780367076382
DOIs
Publication statusPublished - 1 Jan 2022

Fingerprint

Dive into the research topics of 'Building a corpus to represent a variety of a language'. Together they form a unique fingerprint.

Cite this