elasticsearch-langdetect
elasticsearch-langdetect copied to clipboard
language detection in array fields
Hi,
Is it possible to configure this plugin to detect languages for array fields/
What do you mean by array fields?
Array Type http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-array-type.html
That does not make any sense. Language detection is only possible in a string field.
Not sure how it makes no sense. Arrays can hold strings. You can define it as a LIST of strings. The documentation example cannot be any more clear
If you want to check multiple strings, you can get multiple results, and this can no longer be done in a single detect operation. So the result would be confusing for multiple languages in multiple strings and this makes no sense.
For multivalued strings, the plugin works only if the detected language is the same in all strings. It merely checks the first string. E.g.
curl -XDELETE 'localhost:9200/test'
curl -XPUT 'localhost:9200/test'
curl -XPOST 'localhost:9200/test/article/_mapping' -d '
{
"article" : {
"properties" : {
"content" : { "type" : "langdetect" }
}
}
}
'
curl -XPUT 'localhost:9200/test/article/1' -d '
{
"title" : "Some title",
"content" : [
"Oh, say can you see by the dawn`s early light",
"What so proudly we hailed at the twilight`s last gleaming?"
]
}
'
curl -XGET 'localhost:9200/test/_refresh'
curl -XPOST 'localhost:9200/test/_search' -d '
{
"query" : {
"term" : {
"content.lang" : "en"
}
}
}
'