2012
Moving projects and code to GitHub
11/25/12 12:14 Filed in: Info
I am moving code and project folders to GitHub. I don’t know, whether this is a good idea, it just turns out to be easier to use… 
This port includes the SNLTK code, all kinds of Python 3 projects, Java code, some of the C(++) code for FSTs and some NLP tasks, corpus and TEI XML utils. Some of that I limited to pull-only and push-access exclusively for collaborators. If you were involved in some of that, let me know, send me your GitHub-ID is and I can add you to the collaborators group of the particular repos.
In particular, my course material will be migrated to GitHub completely. For example, the course material for the LSA Summer Institute course in summer 2013 will be placed there:
Python 3 for Linguists
Read More...
This port includes the SNLTK code, all kinds of Python 3 projects, Java code, some of the C(++) code for FSTs and some NLP tasks, corpus and TEI XML utils. Some of that I limited to pull-only and push-access exclusively for collaborators. If you were involved in some of that, let me know, send me your GitHub-ID is and I can add you to the collaborators group of the particular repos.
In particular, my course material will be migrated to GitHub completely. For example, the course material for the LSA Summer Institute course in summer 2013 will be placed there:
Python 3 for Linguists
Read More...
Some old files about the Linguistics Program at the University of Zadar
10/25/12 15:29 Filed in: Info
Since I was asked many times about this MA program and the original text that went to the accreditation committee in Croatia (where we got one very nasty and absolutely irrelevant review, if I find it, I’ll post it here; but also a very good and constructive review), here are the files, the Croatian and English text about the MA program in Linguistics that we submitted for accreditation within the Bologna system back in 2008 at the University of Zadar. I think, this is the corrected version. It was not the best possible program, developed under time pressure and in a very difficult situation, and its was building on the growing wave of computational linguistics, speech and language technology, as well as theoretical linguistics. We would do a lot of things differently nowadays. If you can use any of this for your inspiration or personal attempts to apply for a program or other support, let us know. I can forward you the editable version for some Office package.
English version:
Read More...
English version:
- Croatian version:
Read More...
LibreOffice and TEI Stylesheets for file conversion
10/17/12 23:47 Filed in: Corpus Linguistics
If you want to batch convert a lot of files to some more accessible format (for example ODT or DOCX to HTML or TEI XML), you can use first of all LibreOffice.
Here is a brief introduction how to batch convert files to some LibreOffice output format or TEI XML.
Read More...
Here is a brief introduction how to batch convert files to some LibreOffice output format or TEI XML.
Read More...
XFST: Python 3 script to convert prolog file to DOT-graph
10/16/12 11:38 Filed in: Computational Linguistics
If you write out a stack (or network) in XFST to a prolog file:
write prolog > mymorph.plg
and you want to convert it to DOT and visualize it in Graphviz, here is a Python 3.x script to do so:
Download zipped Python source
View Python code
Read More...
write prolog > mymorph.plg
and you want to convert it to DOT and visualize it in Graphviz, here is a Python 3.x script to do so:
Download zipped Python source
View Python code
Read More...
WSU talk: info on corpora and tech that will be discussed
10/07/12 09:26 Filed in: Info
I’ll give a talk on corpora and relevant technologies at Wayne State University in Detroit on the 19th of October at 11 AM. Here are some links, papers and slides that might be interesting for colleagues and students to follow and post process:
Read More...
Read More...
Java programming sessions for the ILIT group
10/02/12 17:14 Filed in: Info
We are meeting Fridays at 9 AM in the Cooper building for Java programming.
You might want to prepare your machine by installing:
1. the Java SE 7u7 JDK:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. the NetBeans 7.2 IDE:
http://netbeans.org/downloads/index.html
and maybe reading some of the Java Tutorial:
http://docs.oracle.com/javase/tutorial/index.html
Read More...
You might want to prepare your machine by installing:
1. the Java SE 7u7 JDK:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. the NetBeans 7.2 IDE:
http://netbeans.org/downloads/index.html
and maybe reading some of the Java Tutorial:
http://docs.oracle.com/javase/tutorial/index.html
Read More...
Endangered languages is up
06/21/12 12:49 Filed in: Info
The Endangered Languages site has been launched today:
http://www.endangeredlanguages.com/
Read More...
http://www.endangeredlanguages.com/
Read More...
Clozure CL on Mac App Store
05/25/12 02:46 Filed in: Info
Clozure CL, an open source and free implementation of Common Lisp for Mac is available on the App Store:
http://itunes.apple.com/us/app/clozure-cl/id489900618?mt=12
Read More...
http://itunes.apple.com/us/app/clozure-cl/id489900618?mt=12
Read More...
Talk at the IDS 8th of May
05/07/12 05:55 Filed in: Info
Tomorrow, 8th of May 2012, I will be presenting at the Institute of German Language in Mannheim, and there is the last day of Maimarkt… I might meet U there???
Read More...
Read More...
Course at LSA Institute 2013: Python 3 for Linguists
04/19/12 13:38 Filed in: Info
Malgosia and I will be teaching a course at the LSA Institute 2013 at the University of Michigan in Ann Arbor: Python 3 for Linguists.
Thanks to the Institute Steering Committee for accepting our proposal!
Read More...
Thanks to the Institute Steering Committee for accepting our proposal!
Read More...
Talk: Piotr Banski "TEI XML for Linguists"
04/18/12 21:20 Filed in: Info
Please join us for a talk by:
Dr. Piotr Banski (Institute for German Language/Institut fuer Deutsche Sprache, Mannheim, Germany)
Title: "TEI XML for Linguists"
Time: Friday, April 20, 2012 at 2:00 pm
Location: Suite 104, Cooper Building, on the Eastern Michigan University campus (see Google maps)
Read More...
Dr. Piotr Banski (Institute for German Language/Institut fuer Deutsche Sprache, Mannheim, Germany)
Title: "TEI XML for Linguists"
Time: Friday, April 20, 2012 at 2:00 pm
Location: Suite 104, Cooper Building, on the Eastern Michigan University campus (see Google maps)
Read More...
Talk: M. Cavar "On the influence of L1 on the L2 perception: The case of tenseness contrast in American vowels"
04/11/12 11:26 Filed in: Info
Date: April 13th, 2012
Time: 1:30 PM
Location: Cooper Building, Suite 104, EMU, 2000 Huron River Drive, Ypsilanti
Directions: Take Washtenaw heading east from Ann Arbor toward Ypsilanti. Go past Hwy 23, turn left on Golfside, then turn right on Huron River Drive. The Cooper Building will be on the left, across from Rynearson Stadium, and there is free parking right out front. If you reach Superior St. you have gone too far.
Title: On the influence of L1 on the L2 perception: The case of tenseness contrast in American vowels
Author: Malgorzata E. Cavar
Abstract:
One obvious difficulty in foreign language learning is the production of foreign sounds. What is less obvious is the fact that the perception of foreign categories by L2 learners differs from that of the native speakers and in itself might be and often is a hurdle in the acquisition of the phonetic/phonological system of the foreign language. In this talk, I will present the results of a series of experiments pertaining to the perception of the English vocalic contrast in high vowels by learners with different L1 backgrounds. The goal of this and similar studies is to determine how perceptual strategies of L2 learners differ from those of English native speakers and what these differences depend on. In the long run, the aim is to predict “customized” areas of difficulty for learners with different backgrounds and to help develop curricula and teaching aids that would actually respond to learners’ needs.
Read More...
Time: 1:30 PM
Location: Cooper Building, Suite 104, EMU, 2000 Huron River Drive, Ypsilanti
Directions: Take Washtenaw heading east from Ann Arbor toward Ypsilanti. Go past Hwy 23, turn left on Golfside, then turn right on Huron River Drive. The Cooper Building will be on the left, across from Rynearson Stadium, and there is free parking right out front. If you reach Superior St. you have gone too far.
Title: On the influence of L1 on the L2 perception: The case of tenseness contrast in American vowels
Author: Malgorzata E. Cavar
Abstract:
One obvious difficulty in foreign language learning is the production of foreign sounds. What is less obvious is the fact that the perception of foreign categories by L2 learners differs from that of the native speakers and in itself might be and often is a hurdle in the acquisition of the phonetic/phonological system of the foreign language. In this talk, I will present the results of a series of experiments pertaining to the perception of the English vocalic contrast in high vowels by learners with different L1 backgrounds. The goal of this and similar studies is to determine how perceptual strategies of L2 learners differ from those of English native speakers and what these differences depend on. In the long run, the aim is to predict “customized” areas of difficulty for learners with different backgrounds and to help develop curricula and teaching aids that would actually respond to learners’ needs.
Read More...
Tokenization, frequency profiles and N-gram models in Python 3
04/03/12 11:01 Filed in: Info
This is a brief description about how to use the Python 3 scripts to generate N-gram models for word tokens and characters from text. I expect you to have a Python 3 interpreter installed on your system.
Read More...
Read More...
The LINGUIST List corpus
04/03/12 06:06 Filed in: Corpus Linguistics
The LINGUIST List corpora can be found here:
http://ltl.emich.edu/llc/
You can find in there the LINGUIST List mailings converted to TEI P5 XML. The linguistically annotated version will be available in an extended interface.
See the previous blog for instructions on how to use Philologic…
Read More...
http://ltl.emich.edu/llc/
You can find in there the LINGUIST List mailings converted to TEI P5 XML. The linguistically annotated version will be available in an extended interface.
See the previous blog for instructions on how to use Philologic…
Read More...
Working with the Philologic interface on the LTL corpora
03/26/12 21:21 Filed in: Corpus Linguistics
Here is a brief first introduction to the Philologic interface for the LTL corpora and the LINGUIST List corpus;
Read More...
Read More...
Drawing syntactic trees...
03/21/12 14:39 Filed in: Syntax
I have been asked by many students and colleagues, how to generate nice looking trees for presentations, assignments, papers etc. Here is a small summary of tools I have tried or seen.
If you want to generate a graph of a syntactic relation, a syntactic tree, there are various ways to do that, without manually drawing it on paper and scanning the manual work... here is a small summary of ways and tools for generating syntactic trees...
Read More...
If you want to generate a graph of a syntactic relation, a syntactic tree, there are various ways to do that, without manually drawing it on paper and scanning the manual work... here is a small summary of ways and tools for generating syntactic trees...
Read More...
The LTL corpus
03/08/12 12:39 Filed in: Corpus Linguistics
The first version of the small LTL corpus with a couple of million tokens is online. It contains TEI P5 XML encoded books from the public domain. See here…
Read More...
Read More...
TEI online converter: OxGarage Converter
03/08/12 12:35 Filed in: Corpus Linguistics
The online OxGarage Converter on the TEI pages converts almost anything to something else, in particular to TEI XML. This is obviously using the OpenOffice filters and converters in the backend as batch processors, as described here for the manual conversion.
Read More...
Read More...
Lithuanian Morphology and LFG-Grammar...
03/05/12 19:17 Filed in: Info
The poster for the DGfS annual meeting 2012 on a Lithuanian Morphology and LFG Grammar is done. This was the result of a grad course at the University of Konstanz on rule-based natural language processing (using XFST and XLE). I am proud of all the participants!
Here is the poster. You can test the morphology online. The coverage will improve, this is based on the morpheme numbers from the poster, without generic morphological rules. The generator will be made available there too.
Read More...
Here is the poster. You can test the morphology online. The coverage will improve, this is based on the morpheme numbers from the poster, without generic morphological rules. The generator will be made available there too.
Read More...
LINGUIST List has a store on amazon.com
03/05/12 19:04 Filed in: Info
LINGUIST List Fund Drive 2012 has started
02/27/12 10:58 Filed in: Info
Please consider supporting LINGUIST List, just go to the Fund Drive 2012 pages and donate!
Read More...
Read More...
Text analyzed and parsed to TEI XML wrapper
02/24/12 21:23 Filed in: Info
I set up a simple testing page for a wrapper of raw text to TEI XML. It uses in this version just the Stanford CoreNLP tools to tokenize, recognize sentences, part of speech annotate and lemmatize the input. Just paste a paragraph of text in there. In the next version this will be expanded with NLP tools for a couple of more languages, as well as other analysis components and tools for English.
Read More...
Read More...
Charty in JavaScript...
02/23/12 11:02 Filed in: Info
Ben Cool ported Charty (CFG-based Chart parser) to JavaScript for a class project and added in one version feature augmentation and unification to it. You can test it online. This is running on mobile devices like iPad or iPhone in Safari and on Android with a browser that has JavaScript support without any server-based component. See the documentation and test site here…
Read More...
Read More...
Stanford-CoreNLP corenlp.sh script on Mac OS X Lion
02/13/12 19:09 Filed in: Info
To make the Stanford CoreNLP tools work on your Mac OS X 10.7.x (Lion) distribution with the included bash script do this...
Read More...
Read More...
LREC 2012 workshop on Challenges in the management of large corpora
02/13/12 15:15 Filed in: Call
You should really consider joining this LREC 2012 workshop on Challenges in the management of large corpora!
Read More...
Read More...
Changed Privacy Policy
02/13/12 14:02 Filed in: Info
Since privacy policy changes seem to be all around now, here is one by me for the pages here:
If you want to make your web-experience somewhat more private, and prevent me from being able to read out something from the apache log files about you, here are some hints about how you could configure your browser to reduce the amount of personal bits you leave on your way on this page or anywhere else on the web:
Read More...
If you want to make your web-experience somewhat more private, and prevent me from being able to read out something from the apache log files about you, here are some hints about how you could configure your browser to reduce the amount of personal bits you leave on your way on this page or anywhere else on the web:
Read More...
Language Technology Lab (LTL) up
02/07/12 13:54 Filed in: Info
The Language Technology Lab (LTL) (ILIT and EMU) is up, check it out:
http://ltl.emich.edu/
More content to come in the next days and weeks… stay tuned!
Read More...
http://ltl.emich.edu/
More content to come in the next days and weeks… stay tuned!
Read More...
Using Antconc: Notes 1
02/02/12 20:44 Filed in: Info
Dictionaries for Mac OS X
01/28/12 15:31 Filed in: Info
Here are some of the dictionaries for the OS X Dictionary.app:
Read More...
- The dict.cc dictionary plugin English-German, German-English
- Tekl.de German Thesaurus and English-German dictionary
Read More...
TikZ-dependency graph LaTeX library
01/23/12 13:18 Filed in: Info
Online tool for IPA transcription
01/23/12 11:27 Filed in: Info
just restored the pages from backups...
01/18/12 14:14 Filed in: Info
I just restored a bunch of web pages of summer schools and workshops. Some had interesting material on them, in particular pictures. Check out the JSSECL 2006 event…
Read More...
- Fourth Annual Meeting of the Slavic Linguistic Society SLS 2009
- Student Conference on Empirical and Computational Linguistics (JSSECL WS CECL 2006), Zadar, Croatia
- Workshop on Computational Modeling of Lexical Acquisition (CPALA 2005), Split, Croatia
- BOOT-LA workshop at Indiana University
- Computational Linguistics Summer School 2010 at the University of Zadar (CLS2010) (see on Facebook)
- Jadertina Summer School in Empirical and Computational Linguistics (JSSECL 2006)
Read More...
the linguistic Wolfram Demonstrations Projects
01/16/12 16:09 Filed in: Info
Check out these demonstrations from the Wolfram Demonstrations Project:
Read More...
- and all the other Linguistic Demonstrations there...
Read More...
C-FASL 2012, you should join it...
01/15/12 01:51 Filed in: Info
You should submit a paper to Computational Formal Approaches to Slavic Languages (C-FASL) 2012:
http://cl.indiana.edu/~cfasl/
Read More...
http://cl.indiana.edu/~cfasl/
Read More...