Jörg Prante
Jörg Prante
Looks like a race condition. LangdetectService is not thread safe. I think it will help to synchronize the call to LangdetectService in TransportLangdetectAction.
Yes, two threads execute on same node is the race condition. I will push a fix today, it is just wrapping the execution of detectAll in a `synchronized` statement.
The version with the fix is Bundle 2.2.0.5 http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-plugin-bundle/2.2.0.5/
Thanks. My build procedure is broken, as a quick fix, just remove lucene-core-5.4.1.jar and lucene-analyzers-common-5.4.1.jar from plugins/bundle directory...
Good catch. Decompound uses some probabilistics, but not at 100% reliability. "verzinnt" looks like it was not in the training set, so the algorithm fails. Maybe it helps to reduce...
I started to rewrite the original trainer tool to let it run from command line but I got short on time. The original tool is "ASV Toolbox Baseform" with a...
Thank you. There was a bug which is now fixed in version 5.4.0.1: https://github.com/jprante/elasticsearch-langdetect/commit/8ad05fd443185ce5d287b279ec01c9a7937e53e7
Do you want TF/IDF for the index (shard), or for the document?
Yes, this is possible. The selection of documents found in index/type is not utopic. I can walk though a search result with scan/scroll, then retrieve doc-by-doc. This may take extreme...
Yes, I think so.