analysis-ik
analysis-ik copied to clipboard
IK分词器 中英文分词(区分大小写)
POST test001/_analyze { "tokenizer": "ik_max_word", "filter" : ["standard"], "text": "Apple,Banana一起来熟悉文档相关的操作Apple,Banana" } 结果: { "tokens" : [ { "token" : "apple", "start_offset" : 0, "end_offset" : 5, "type" : "ENGLISH", "position" : 0 }, { "token" : "banana", "start_offset" : 6, "end_offset" : 12, "type" : "ENGLISH", "position" : 1 }, { "token" : "一起", "start_offset" : 12, "end_offset" : 14, "type" : "CN_WORD", "position" : 2 }, { "token" : "一", "start_offset" : 12, "end_offset" : 13, "type" : "TYPE_CNUM", "position" : 3 }, { "token" : "起来", "start_offset" : 13, "end_offset" : 15, "type" : "CN_WORD", "position" : 4 }, { "token" : "起", "start_offset" : 13, "end_offset" : 14, "type" : "COUNT", "position" : 5 }, { "token" : "来", "start_offset" : 14, "end_offset" : 15, "type" : "CN_CHAR", "position" : 6 }, { "token" : "熟悉", "start_offset" : 15, "end_offset" : 17, "type" : "CN_WORD", "position" : 7 }, { "token" : "文档", "start_offset" : 17, "end_offset" : 19, "type" : "CN_WORD", "position" : 8 }, { "token" : "相关", "start_offset" : 19, "end_offset" : 21, "type" : "CN_WORD", "position" : 9 }, { "token" : "的", "start_offset" : 21, "end_offset" : 22, "type" : "CN_CHAR", "position" : 10 }, { "token" : "操作", "start_offset" : 22, "end_offset" : 24, "type" : "CN_WORD", "position" : 11 }, { "token" : "apple", "start_offset" : 24, "end_offset" : 29, "type" : "ENGLISH", "position" : 12 }, { "token" : "banana", "start_offset" : 30, "end_offset" : 36, "type" : "ENGLISH", "position" : 13 } ] } 期望结果: 原本是什么就分成什么,不要转换大小写,该怎么实现呢? 请指教!
应该可以参考 https://github.com/medcl/elasticsearch-analysis-ik/issues/386 配置:enable_lowercase=false