The projects I am currently working on include:
-
Inductive general grammar - Rethinking linguistics from the perspective of computation and data science. Viewing a language as a computable bidirectional transduction between sound and meaning, and general linguistics as machine learning of grammars that are complete definitions of such transductions.
-
CLD - To support work on inductive general grammar, we require a "Universal Corpus," or at least a diverse multilingual corpus. The CLD software is designed to enable language speakers to contribute directly and to provide a platform for computational-linguistic research on low-resource languages.
-
Bidirectional grammars - With a few exceptions, the subfields of language understanding and language generation have proceeded in isolation from one another. There was a burst of interest in bidirectional grammars 20 years ago, but it largely involved hacking logic programming compilers and had limited success. I have been exploring a simpler solution.
-
Semantics of improper anaphora - Ezra Keshet and I have been collaborating for several years now on the semantic phenomena that motivate dynamic semantics, with the aim of capturing the same phenomena with a simpler classic system.
-
Musicolinguistics - applying parsing technological to music scores, in order to do automatic harmonic and contrapuntal analysis.
I am also interested in or have worked on the following topics:
- automated phonetic transcription
- conversational agents
- grammatical inference
- spoken language systems
- semisupervised learning and spectral methods
- information extraction and NLP services for social science research
- partial parsing and deterministic parsing