Formal concepts and natural languages

Back in January, Yiannis (Vlassopoulos) and I were talking about “quadratic relations” and higher concepts in language, for example the analogy between “(king, queen)” and “(man, woman)”. Deep neural networks can learn such relations from a set of natural language texts, called a corpus. But there are other ways of learning such relations:

Representation of corpus How to learn / compute
n-grams string matching
word embeddings standard algorithms, e.g. neural nets
presyntactic category computational category theory
bar construction Koszul dual
formal concept lattice TBD

There are some very nice connections between all five representations. In a way, they’re all struggling to get away from the raw, syntactic, “1-dimensional” data of word co-location to something higher-order, something semantic. (For example, “royal + woman = queen” is semantic; “royal + woman = royal woman” is not.) I’d like to tell a bit of that story here.

Continue reading