elasticsearch-langdetect
elasticsearch-langdetect copied to clipboard
Search language not found
I try to get lang detect via Rest API that found my expected language. But when I search from mapped index which not found the document.
First mapping index
PUT /language_detection_2
{
"mappings": {
"stream": {
"properties": {
"text": {
"type": "text",
"fields": {
"language": {
"type": "langdetect",
"languages": [
"ja",
"en",
"th",
"ko"
],
"store": true
}
}
}
}
}
}
}
then put data
PUT language_detection_2/stream/2
{
"text": "มีความขี้เกียจระดับ 10 วันจันทร์มันก็จะประมาณนี้แหละ 😅 @ True Tower https:\/\/www.instagram.com\/p\/BbtPqZllnPmtfX2MRmUhYT-"
}
PUT language_detection_2/stream/3
{
"text": "khaohom01ทุกคนเก่งมากคะ❤#mtutd bambam_boobiiผลเท่าไหร่จ้าน้องข้าวหอม😊 khaohom01@bambam_boobii ชนะ1-0ค่ะ😃😃"
}
PUT language_detection_2/stream/4
{
"text": "นุ้งหมี วิ่งเร็วอะถ่ายไม่ทัน 5555 #narubadin #nw13 #toyotaleaguecup #brutd #mtutd 11.10.60 @ i-mobile Stadium"
}
Then search
GET language_detection_2/_search
{
"query": {
"match": {
"text.language": "th"
}
}
}
Got just 2 documents
"hits": {
"total": 2,
"max_score": 0.6931472,
"hits": [
{
"_index": "language_detection_2",
"_type": "stream",
"_id": "2",
"_score": 0.6931472,
"_source": {
"text": "มีความขี้เกียจระดับ 10 วันจันทร์มันก็จะประมาณนี้แหละ 😅 @ True Tower https://www.instagram.com/p/BbtPqZllnPmtfX2MRmUhYT-"
}
},
{
"_index": "language_detection_2",
"_type": "stream",
"_id": "3",
"_score": 0.2876821,
"_source": {
"text": "khaohom01ทุกคนเก่งมากคะ❤#mtutd bambam_boobiiผลเท่าไหร่จ้าน้องข้าวหอม😊 khaohom01@bambam_boobii ชนะ1-0ค่ะ😃😃"
}
}
]
}
Is this bug or I do something wrong? How field text.language
store the detected languages? Could I display this field?
Not sure of the inner implementation but the third document text is being detected as "en" even though the language detection favors "th"
GET _langdetect { "text": "นุ้งหมี วิ่งเร็วอะถ่ายไม่ทัน 5555 #narubadin #nw13 #toyotaleaguecup #brutd #mtutd 11.10.60 @ i-mobile Stadium" }
{
"languages": [
{
"language": "th",
"probability": 0.4285714155915268
},
{
"language": "ro",
"probability": 0.428569318044014
},
{
"language": "en",
"probability": 0.14285807944062645
}
]
}