Linguistics NLP model from physics and category theory
Recently, Coecke et al. (2010) used high level cross-disciplinary techniques from logic, category theory, and physics to bring the above two approaches together. They developed a unified mathematical framework whereby a sentence vector is by definition a function of the Kronecker product of its
word vectors. A concrete instantiation of this theory was exemplified on a toy hand crafted corpus by Grefenstette et al. (2011).
http://dl.acm.org/citation.cfm?id=2145580
Modelling compositional meaning for sentences using empirical
distributional methods has been a challenge for computational linguists.
We implement the abstract categorical model of Coecke et al. (2010)
using data from the BNC and evaluate it. The implementation is based on
unsupervised learning of matrices for relational words and applying them
to the vectors of their arguments. The evaluation is based on the word
disambiguation task developed by Mitchell and Lapata (2008) for
intransitive sentences, and on a similar new experiment designed for
transitive sentences. Our model matches the results of its competitors
in the first experiment, and betters them in the second. The general
improvement in results with increase in syntactic complexity showcases
the compositional power of our model.
Related:
http://arxivindex.blogspot.com/2012/04/more-linguistics-and-qft.html