Introduction to Corpus Linguistics

Marko Tadić, University of Zagreb

Topics:
short history of corpus-oriented linguistic research;
"early" corpus linguistics;
Chomsky and corpus criticism / defence;
the first computer corpora; corpus as the methodological construct;
text-collection vs. corpus; design and corpus compilation;
sampling procedures and corpus parameters;
types of corpora;
results of corpus querying;
corpus annotation:
pre-linguistic (text structure, metadata) and linguistic (POS/MSD tagging and lemmatization, syntactic, semantic, pragmatic, prosodic etc. tagging);
statistical methods in corpus linguistics;
human language technologies and the role of corpora.

Recommended literature is:
McEnery, Tony & Wilson, Andrew (1996, (2)2002) Corpus Linguistics, Edinburgh University Press, Edinburgh.
Kennedy, Graeme (1998) Introduction to Corpus Linguistics, Longman, London.
Sampson, Geoffrey & McCarthy, Diana (2004) Corpus Linguistics: Readings in a Widening Discipline, Continuum, London-New York.
Sinclair, John McH (1991) Corpus, concordance, collocation, Oxford University Press, Oxford.
Tadić, Marko (2003) Jezične tehnologije i hrvatski jezik, Exlibris, Zagreb.