elasticsearch-langdetect icon indicating copy to clipboard operation
elasticsearch-langdetect copied to clipboard

language detection in array fields

Open redserpent7 opened this issue 10 years ago • 5 comments

Hi,

Is it possible to configure this plugin to detect languages for array fields/

redserpent7 avatar Mar 27 '15 14:03 redserpent7

What do you mean by array fields?

jprante avatar Mar 27 '15 15:03 jprante

Array Type http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-array-type.html

redserpent7 avatar Mar 27 '15 16:03 redserpent7

That does not make any sense. Language detection is only possible in a string field.

jprante avatar Mar 27 '15 16:03 jprante

Not sure how it makes no sense. Arrays can hold strings. You can define it as a LIST of strings. The documentation example cannot be any more clear

redserpent7 avatar Mar 27 '15 16:03 redserpent7

If you want to check multiple strings, you can get multiple results, and this can no longer be done in a single detect operation. So the result would be confusing for multiple languages in multiple strings and this makes no sense.

For multivalued strings, the plugin works only if the detected language is the same in all strings. It merely checks the first string. E.g.

curl -XDELETE 'localhost:9200/test'
curl -XPUT 'localhost:9200/test'
curl -XPOST 'localhost:9200/test/article/_mapping' -d '
{
  "article" : {
    "properties" : {
       "content" : { "type" : "langdetect" }
    }
  }
}
'
curl -XPUT 'localhost:9200/test/article/1' -d '
{
  "title" : "Some title",
  "content" : [
        "Oh, say can you see by the dawn`s early light",
        "What so proudly we hailed at the twilight`s last gleaming?"
        ]
}
'
curl -XGET 'localhost:9200/test/_refresh'
curl -XPOST 'localhost:9200/test/_search' -d '
{
   "query" : {
       "term" : {
            "content.lang" : "en"
       }
   }
}
'

jprante avatar Mar 27 '15 22:03 jprante