Making mordecai more generalized solution and explaining training data
Hello,
Is there any plan to make mordecai more generalized solution for using different NER libraries? For ex. https://github.com/Hironsan/anago Maybe some wrapper around libraries could be used.
Could you also explain models used in Keras for parsing? (which features and labels are used, etc.)
In principle, there's no reason why you couldn't use another NER library. The initial plan with Mordecai was to make it very easy to swap out NER libraries, but that became harder as I integrated spaCy more. (A big part of that was because it has embeddings built in, and because I'm using some of the other grammatical features for a later extension focusing on linking events and locations).
The features are not well documented yet, but you can see what they are for the country picking here and for the specific place picking here. The model never learns country-specific features, it does a ranking procedure that learns the similarity between a "query" (the place name in context) and "documents" (either country or gazetteer entries).
I keep hoping to have some time to document this more and I'll leave the issue open until I do.