Language Understanding Using Local Patterns


The representation of semantics is likely the most important open problem in Natural Language Proc-essing. There are two broad areas of semantics: lexical semantics (the meaning of words and phrases), and text semantics (the meaning of word sequences). In lexical semantics there are two leading approaches: vector space methods and pattern based methods. Pattern methods utilize local patterns and are thus much faster and more amenable to a hardware implementation. Regarding text meaning, the leading approach is that of semantic roles, and novel compositional methods are starting to emerge.

The research team under Prof. Ari Rappoport (HUJI) will develop new algorithms for understanding natural language semantics using local patterns. These algorithms will be evaluated on a set of practical applications such as question answering and natural human-computer interfaces.

The project proposes to utilize local pattern methods for the acquisition of lexical semantics and for text understanding. In past work, the researchers have shown that high quality lexical semantics can be captured using local patterns; however, nobody has yet shown how to unify those different representations into a single higher dimensional representation capable of representing text semantics. Since these new algorithms are based on local operations, it should be possible to implement them in very efficient way in hardware. This is in contrast to vector based methods, in which high dimensional vectors are present at each computing step.

The long term outcome of the project is an architecture for the computerized representation of natural language semantics, along with a set of algorithms for actually computing it given text corpora, and with a set of applications that use it.

In the first year, the goal is to come with the overall design of the architecture, the formal specification of how existing pattern-based methods should be enhanced in order to support the long term vision, and the initial design of a set of algorithms for constructing semantic representations given textual corpora. In the third year, we would like to present the implementation of a working prototype serving as a proof of concept for the architecture and its associated algorithms.

Started research. Implemented sentiment classification application as a foundation and example for a semantic task employing pattern-based methods.
Prof. Ari Rappoport, HUJI CSE
Prof. Moshe Koppel, Bar Ilan U
Prof. Naftali Tishby, HUJI CSE
Roy Schwartz, HUJI CSE
Oren Tzur, HUJI CSE

Ari Rappoport ➭

  1. Roy Schwartz, Oren Tsur, Ari Rappoport and Moshe Koppel, "Authorship Attribution of Micro-Messages", EMNLP 2013
  2. O.Abend, A. Rappoport. "Universal Conceptual Cognitive Annotation", (UCCA). ACL 2013.  
  3. O.Tsur, A. Littman, A. Rappoport. "Efficient Clustering of Short Messages into General Domains". ICWSM 2013. 
  4. O. Abend, A. Rappoport. "UCCA: A Semantics-based Grammatical Annotation Scheme". ACL/SIGSEM International Conference on Computational Semantics 2013 (IWCS). 
  5. R Reichart, G.Elidan, A. Rappoport. "A Diverse Dirichlet Process Ensemble for Unsupervised Induction of Syntactic Categories". COLING 2012. 
  6. R.Schwartz, O.Abend, A.Rappoport. "Learnability-based Syntactic Annotation Design". COLING 2012. 
  7. O. Tsur, A.Rappoport. "What's in a Hashtag? Content based Prediction of the Spread of Ideas in Microblogging Communities". Web Search and Data Mining (WSDM) 2012.
  8. O. Tsur, A.Littman, A. Rappoport. "Scalable Multi Stage Clustering of Tagged Micro-Messages". WWW 2012 (short paper).   

Moshe Koppel ➭

  1. Roy Schwartz, Oren Tsur, Ari Rappoport and Moshe Koppel, "Authorship Attribution of Micro-Messages", EMNLP 2013
  2. Aharoni, R., Koppel, M. and Goldberg, Y. (2014). "Automatic Detection of Machine Translated Text and Translation Quality Estimation", Proc. of ACL, Baltimore MD, June 2014

Naftali Tishby ➭