star twitter facebook envelope linkedin youtube alert-red alert home left-quote chevron hamburger minus plus search triangle x

CORPUS LINGUISTICS


What are the advantages and disadvantages of using research corpus/corpora to obtain data for language research other than from the researcher’s intuition?

 

Corpus linguistics, a field of language study based on real-life language use as represented in corpora, has become a fast-growing approach to language studies. Corpus linguistics offers many advantages for using research corpora to obtain data for language research, but it also has some disadvantages.

Using corpora has several significant advantages for linguistic research and analysis. They provide access to authentic evidence of changes in language over time, offering insights into the evolution and development of words, phrases, and grammatical structures. By incorporating texts from different eras and social classes, corpora allow researchers to observe and analyze linguistic variations and shifts. Some texts in a corpus are quite old, dating back centuries, while others are contemporary, written in the last few years. This range enables researchers to track changes in language over extended periods, helping them identify new meanings for existing words and the emergence of new words.

A notable benefit of using corpora is their ability to reveal which words and meanings are most frequently used among speakers. This information is invaluable for understanding current language trends and usage patterns. As new words and meanings emerge, they can be incorporated into dictionaries, allowing for continuous updates and ensuring that dictionaries remain relevant and up-to-date. This process highlights the dynamic nature of language and the importance of corpus data in documenting and reflecting these changes.

The corpus-based approach to linguistic research begins with a hypothesis or a set of hypotheses that are tested using corpus evidence. This approach involves analyzing the frequency of specific language features to either prove or disprove a theory. On the other hand, the corpus-driven approach does not start with a hypothesis. Instead, it focuses directly on analyzing statistical frequencies from word lists to uncover regularities or exceptions in language use. Both approaches offer valuable methodologies for exploring linguistic phenomena and contribute to a deeper understanding of language behavior.

Another advantage of using corpora is the ability to process data consistently and accurately with the help of computers. This ensures that the evidence is not contaminated with errors, which can be a risk when using paper-based corpora manipulated by humans. The precision and reliability of computer-processed data give researchers confidence in the validity of their findings, making corpora an indispensable tool for linguistic analysis.

In our everyday lives, we interact with others primarily through spoken language. Unfortunately, one of the disadvantages of most corpora is that only 10% of the data is based on spoken language texts, with the remaining 90% consisting of written texts. This imbalance makes the corpus less reliable for providing accurate information about spoken language. Written texts are often based on Standard English, while spoken language can include various dialects, colloquialisms, and registers that researchers have limited access to. Since most daily language use is spoken and difficult to capture, our understanding of spoken language remains incomplete.

Another disadvantage of corpora is their inability to determine if a sentence is grammatically or syntactically correct. Texts included in a corpus can be written by anyone, and once they enter the corpus, there is no mechanism to distinguish whether there are errors. Although most words and sentences in a corpus are generally correct, the lack of a facility to easily identify mistakes poses a challenge for researchers who rely on corpus data for accurate linguistic analysis.

Despite these disadvantages, the benefits of using corpora in linguistic research are substantial. Corpora provide a rich and diverse source of language data, enabling researchers to conduct in-depth analyses and gain insights into language use, variation, and change. The ability to process large amounts of data quickly and accurately with specialized software further enhances the value of corpora as a research tool. As the field of corpus linguistics continues to grow, the development of more comprehensive and balanced corpora, including more spoken language data, will address some of the current limitations and further advance our understanding of language.

 

-The end-