analysis-pinyin
analysis-pinyin copied to clipboard
严重BUG:当分词内容中包含单独的A字母时,这个A字母会被分词器扔掉
GET /_analyze { "analyzer" : "ik_smart", "text" : "我们A A制" } { "tokens": [ { "token": "我们", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "制", "start_offset": 5, "end_offset": 6, "type": "CN_CHAR", "position": 1 } ] }
ik默认会加载一个停用词典stopword.dic,里面包含字母'a'(在英文中被认为是停用词),所以会被过滤掉,把ik目录下/config/stopword.dic清空就可以了