Corpus Linguistics

LibreOffice and TEI Stylesheets for file conversion

If you want to batch convert a lot of files to some more accessible format (for example ODT or DOCX to HTML or TEI XML), you can use first of all LibreOffice.

Here is a brief introduction how to batch convert files to some LibreOffice output format or TEI XML.
Read More...

The LINGUIST List corpus

The LINGUIST List corpora can be found here:

http://ltl.emich.edu/llc/

You can find in there the LINGUIST List mailings converted to TEI P5 XML. The linguistically annotated version will be available in an extended interface.

See the previous blog for instructions on how to use Philologic

Read More...

Working with the Philologic interface on the LTL corpora

Here is a brief first introduction to the Philologic interface for the LTL corpora and the LINGUIST List corpus;

Read More...

The LTL corpus

The first version of the small LTL corpus with a couple of million tokens is online. It contains TEI P5 XML encoded books from the public domain. See here
Read More...

TEI online converter: OxGarage Converter

The online OxGarage Converter on the TEI pages converts almost anything to something else, in particular to TEI XML. This is obviously using the OpenOffice filters and converters in the backend as batch processors, as described here for the manual conversion.
Read More...