The representation of semantics is likely the most important open problem in Natural Language Proc-essing. There are two broad areas of semantics: lexical semantics (the meaning of words and phrases), and text semantics (the meaning of word sequences). In lexical semantics there are two leading approaches: vector space methods and pattern based methods. Pattern methods utilize local patterns and are thus much faster and more amenable to a hardware implementation. Regarding text meaning, the leading approach is that of semantic roles, and novel compositional methods are starting to emerge.
The research team under Prof. Ari Rappoport (HUJI) will develop new algorithms for understanding natural language semantics using local patterns. These algorithms will be evaluated on a set of practical applications such as question answering and natural human-computer interfaces.
The project proposes to utilize local pattern methods for the acquisition of lexical semantics and for text understanding. In past work, the researchers have shown that high quality lexical semantics can be captured using local patterns; however, nobody has yet shown how to unify those different representations into a single higher dimensional representation capable of representing text semantics. Since these new algorithms are based on local operations, it should be possible to implement them in very efficient way in hardware. This is in contrast to vector based methods, in which high dimensional vectors are present at each computing step.
The long term outcome of the project is an architecture for the computerized representation of natural language semantics, along with a set of algorithms for actually computing it given text corpora, and with a set of applications that use it.
In the first year, the goal is to come with the overall design of the architecture, the formal specification of how existing pattern-based methods should be enhanced in order to support the long term vision, and the initial design of a set of algorithms for constructing semantic representations given textual corpora. In the third year, we would like to present the implementation of a working prototype serving as a proof of concept for the architecture and its associated algorithms.
Ari Rappoport ➭
Moshe Koppel ➭
Naftali Tishby ➭