Education
In high school, I took classes in German, Spanish, and Latin, and dabbled in Greek and Russian on my own. Interest in Tolkien and constructed languages led me to linguistics.
As an undergraduate, I studied classics and linguistics at Indiana University, completing the B.A. in 1983. I majored in Greek, and also took classes in Latin, German, French, and Hebrew. Our field methods course was on Soninke (a Mande language), and I got research funding to keep working with the consultant on my own during the summer [1]. My senior thesis was on comparative mythology.
I received my doctorate in linguistics from MIT in 1987. My dissertation [6] proposed the "DP-hypothesis," which treats functional elements uniformly as syntactic heads.
Bellcore (1987-1993)
As a student, I also began working on parsing [3,4,7], which led first to a summer internship, and then to a full-time position at Bell Communications Research (Bellcore). I was interested in emulating human parsing, and my approach was to factor parse trees into "chunks and dependencies." A branch of that work was the connection between syntactic structure and prosody.
At Bellcore, I began studying stochastic models. I had the good fortune to collaborate with Kevin Mark and Michael Miller one summer [22]. My contributions at the time were entirely linguistic rather than mathematical, but I did absorb the idea of random fields from them.
Tübingen, Germany (1993-1997)
At Tübingen, my work on chunk parsing culminated in the parser called Cass. Its major advantage was speed: in contrast to standard chart parsers, that ran at 1-10 words per second, Cass processed 10,000 words per second, allowing one to parse large corpora rapidly. What was missing in Cass was the dependencies part of "chunks and dependencies," and I began working on induction methods to acquire them, using Cass itself to bootstrap them from corpora, in collaboration with Mats Rooth and Marc Light.
I also spent time studying random fields and used them to formulate a probabilistic version of attribute-value grammars.
AT&T Laboratories (1997-2002)
I continued working on bootstrapping at AT&T Labs. I became especially interested in semisupervised learning and boosting. I also revived my earlier work on prosody.
My main projects, though, involved building systems:
-
Ionaut, built in collaboration with Michael Collins and Amit Singhal, combined web search with entity recognition and question answering.
-
Mage was a spoken dialogue system for phone-based email access in which I integrated several technologies developed by other groups at the Labs (speech recognition, speech synthesis, and telephony control) and added natural language processing and dialogue management.
-
PreTTS, built in collaboration with Don Hindle, was a system for parsing and preprocessing complex email messages in order to drive speech synthesis and "read" them comprehensibly.
University of Michigan (since 2002)
Since coming to the University of Michigan, my major projects have been:
-
Information extraction, especially in the biomedical domain.
-
Writing a book on semisupervised learning [62].
-
Language digitization, which is to say, language documentation and description that supports automated processing across languages.