Selkie

Selkie is a set of applications and a software library to support language digitization, which is to say, computational documentary linguistics. By documentary linguistics I mean the combination of language documentation and language description. Conventionally, the product of the former is a corpus, and the product of the latter is a grammar and lexicon. The products of language digitization are electronic versions of corpus, lexicon, and grammar, integrated with each other and supporting additional computational functionality, such as automated interpretation.

Selkie is experimental code, not a finished product. Much of it is under active development and is likely to change in the future. Similarly, the documentation is still in draft form. I am making it publicly available to give easier access to students and anyone else who may be interested.

The two major pieces of Selkie are a corpus editor (an application for documentary linguistics), and a natural-language processing (NLP) pipeline. The pipeline is not currently integrated into the editor, though that is intended.

Cass

Cass is a partial parser that used to be available from this page. Unfortunately, it no longer compiles under current versions of gcc, and fixing the problem would require a fairly substantial rewrite. If you want the old tarfile, it is here.