The Language Technologies Institute

at Carnegie Mellon University

 

Jonathan Clark
Graduate Student
School of Computer Science

jhclark
@
cs.cmu.edu

 

Research

Broadly, I am interested in how we can use linguistics, cognition, and statistics to improve computational models of human language. Currently, I work with Alon Lavie on the AVENUE project, which is leveraging linguistically-motivated syntax to improve machine translation aimed at both resource rich and resource poor languages. My research is focused on developing discriminant syntactic features that help the system choose better translations. This includes both phrase structure and dependency structure and how to best statistically model these structures so that we can capture the behavior of the language pair being translated.

Previously, I worked with Lori Levin and Robert Frederking on a year-long pilot project (also a part of AVENUE) investigating active learning techniques for presenting the a bilingual person with the examples from a linguistically-structured corpus so that such people can be tapped as an efficient and cost-effective resource for improving the quality of machine translation for languages that have few alternatives for acquiring the data needed to traing modern machine translation systems.

Contact

Jonathan Clark
CMU Language Technologies Inst.
Newell-Simon Hall 4502
5000 Forbes Avenue
Pittsburgh, PA 15213
Office: Newell-Simon Hall 3612
Phone: (501) 351-1061

Publications

J. Clark , R. Frederking, L. Levin "Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation", The Second Workshop on Syntax and Structure in Translation (SSST) at the Associtation for Computational Linguistics (ACL), 2008. Columbus, Ohio. [PDF] [Slides]

J. Clark , R. Frederking, L. Levin "Toward Active Learning in Corpus Creation: Automatic Discovery of Language Features During Elicitation", The Sixth Language Resources and Evaluation Conference (LREC), 2008. Marrakech, Morocco. [PDF] [Slides]

J. Clark , C. Hannon, "A Classifier System for Author Recognition Using Synonym-Based Features", Sixth Mexican International Conference on Artificial Intelligence , November 2007. Aguascalientes, Mexico. [PDF]

J. Clark , C. Hannon, "An Algorithm for Identifying Authors Using Synonyms", ENC 2007 , September 2007. Morelia, Mexico.

M. Bowden, M. Olteanu, P. Suriyentrakorn, J. Clark, D. Moldovan, "LCC's PowerAnswer at QA@CLEF 2006," CLEF 2006 Working Notes, September, 2006. Alicante, Spain. [PDF]

C. Hannon, J.Clark, "A Cognitive-Based Approach to Learning Integrated Language Components", The Third International Workshop on Natural Language Understanding and Cognitive Science, May 2006. Paphos, Cyprus

Reports

J.Clark, "Treegraft: A Stochastic Transduction Chart Parser", NLP Lab Self-Defined Project Final Report, Spring 2008. [PDF] [Google Code Project page]

The Initial

With apologies to Noah A. Smith, I also feel the need to explain the pretentious middle initial on all my publications: The name Jon Clark is only slighly less common than John Smith. Other Jonathan Clarks include the 2007 CMU MBA class co-president, the songwriter, the photographer, the woodworker, the journalist, the comedian, the cameraman, the actor, the teacher, the pilot, the athlete, the golfer, the biker, the boxing champion, the lighting designer, the British artist, the sculptor, the architect, the health technologist, the computational biology professor, the personal trainer, the wellness professional, the history professor, the chief counsel for Morgan Stanley, the finance professor, the attorney, the founder of Thinstall (virtualization software), the senior VP at Sallie Mae, the real estate agent, the university president, the music professor, the post-hardcore band singer, the founder of Business Writing Solutions, the 18th century general, the basketball player, the NLP trainer (NeuroLinguistic Programming), the telecommunications consultant, the IT professional, the search marketing specialist, the computer engineering student, the polymer research engineer, the physician, another physician, another still, the surgeon, the zoology professor, the biomedial robotics professor (who, incidentally, published a paper with Jorge Cham), and the former CTO of LionBridge (large language engineering company that made this translation software... talk about hard to be unique). Even the initial doesn't always work; the other Jonathan H Clark is a Texas lawyer.

Courses

Spring 2009

Fall 2008

Spring 2008

Fall 2007

Personal

When I'm not knee-deep in code, I enjoy going to Pittsburgh Pirates baseball games with my fiancé Libby (while eating nachos topped with obscene amounts of jalapeños), playing drums (jazz, hand percussion, metal, it's all good stuff), and learning bits of random languages. And of course, reading Jorge Cham's wonderful PhD comics (follow the link for more laughs):

Links

Simple, but Brilliant Java Proramming Advice

Choosing a Ph.D. Program in Computer Science (Berkley)
Advice on Applying (and whether to apply) for a Ph.D. in Computer Science (CMU)
Advice on Applying for Ph.D., Fellowships, and Other Such (Stanford)
Advice for Writing Personal Statements

Favorite Web 2.0

Remember the Milk - Advanced Todo List
Pros: Implements Getting Things Done and most of Randy Pausch's Time Management lecture
Cons: Doesn't integrate with Paymo

Google Calendar - Tells me when to be places
Pros: Easy to use interface and support for sharing calendars

Paymo - Time Tracker
Pros: Easily tracks estimated and actual time spent on tasks
Cons: Doesn't integrate with Remember the Milk

Mint - Personal Finance Tracker and Bugeting
Pros: Automatically syncs transactions with your bank and learns categories for them
Cons: No yearly budget categories

LiveStrong (TheDailyPlate) - Nutrition Tracker
Pros: Has huge database of food products
Cons: Doesn't integrate with a barcode scanner in a PDA to allow me to scan items as I prepare a meal