elasticsearch-plugin-bundle icon indicating copy to clipboard operation
elasticsearch-plugin-bundle copied to clipboard

_langdetect endpoint missing in ES 2.2?

Open marbleman opened this issue 9 years ago • 8 comments

With ES 1.7 we used the _langdetect endpoint to verifiy the language of a document prior to indexing it according to the examples from https://github.com/jprante/elasticsearch-langdetect.

Trying the same with ES.2.2 and bundle 2.2.0.1 the example query now returns

curl -XPOST 'localhost:9200/_langdetect?pretty' -d 'Das ist ein Test' { "error" : { "root_cause" : [ { "type" : "invalid_index_name_exception", "reason" : "Invalid index name [langdetect], must not start with ''", "index" : "_langdetect" } ], "type" : "invalid_index_name_exception", "reason" : "Invalid index name [langdetect], must not start with ''", "index" : "_langdetect" }, "status" : 400 }

Is the endpoint still available somewhere?

marbleman avatar Feb 23 '16 17:02 marbleman

No comments? Hope my question wasn't too birdbrained... ;) However, if so, I would appreciate a hint on what I am missing...

marbleman avatar Mar 02 '16 16:03 marbleman

Sorry, I overlooked the issue.

I released 2.2.0.2 with a fix.

Download link of plugin zip file is

https://github.com/jprante/elasticsearch-plugin-bundle/releases/download/2.2.0.2/elasticsearch-plugin-bundle-2.2.0.2-plugin.zip

jprante avatar Mar 02 '16 19:03 jprante

Thanks a lot for your response! Installed it right away. Unfortunatlly I get an error no matter if execute from sense or from command line:

curl -XPOST 'localhost:9200/_langdetect?pretty' -d 'Das ist ein Test' { "error" : { "root_cause" : [ { "type" : "illegal_state_exception", "reason" : "failed to find action [org.xbib.elasticsearch.action.langdetect.LangdetectAction@d8b70e11] to execute" } ], "type" : "illegal_state_exception", "reason" : "failed to find action [org.xbib.elasticsearch.action.langdetect.LangdetectAction@d8b70e11] to execute" }, "status" : 500 }

marbleman avatar Mar 02 '16 23:03 marbleman

OK, that was the reason why I removed the REST action.... I have to investigate how to solve this class loader issue.

jprante avatar Mar 02 '16 23:03 jprante

Thanks in advance! IMHO _langdetect REST endpoint is quite an important feature since it allows to check the language prior to indexing. Each document can then be sent to the right index having the appopriate analyzers for that language

marbleman avatar Mar 03 '16 10:03 marbleman

Thx for posting the update!! Just found a typo in the install link ./bin/plugin install 'http://search.maven.org/remotecontent?filepath=org/xbib/elasticsearch/plugin/elasticsearch-plugin-bundle/2.2.0.3/elasticsearch-plugin-bundle-2.2.0.3-plugin.zip'

marbleman avatar Mar 06 '16 23:03 marbleman

Attaching the right analyzer is a feature where REST endpoint is not for.

In ES 1.x this was possible by assigning an analyzer path. In ES 2.x this was removed. I will implement multi-field name extension with automatically setting language analyzers https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html#_analyze_multiple_times

Thanks for finding the typo.

jprante avatar Mar 07 '16 08:03 jprante

This is probably not the right place to discuss some "best practices" (which I would be interested in) but according to some recommendations around the inet we decided to go for seperate indices for each language such as "myindex_de" and "myindex_en" for example. Therefore we have to detect the language prior to indexing... This way we can do searches on "myindex_*" to get results in multiple languages. And we get around all that trouble with mixed languages

marbleman avatar Mar 07 '16 13:03 marbleman