analysis-ik IK分词器中英文分词(区分大小写)

IK分词器中英文分词(区分大小写)

Open hello-yajing opened this issue 4 years ago • 1 comments

POST test001/_analyze { "tokenizer": "ik_max_word", "filter" : ["standard"], "text": "Apple，Banana一起来熟悉文档相关的操作Apple，Banana" } 结果： { "tokens" : [ { "token" : "apple", "start_offset" : 0, "end_offset" : 5, "type" : "ENGLISH", "position" : 0 }, { "token" : "banana", "start_offset" : 6, "end_offset" : 12, "type" : "ENGLISH", "position" : 1 }, { "token" : "一起", "start_offset" : 12, "end_offset" : 14, "type" : "CN_WORD", "position" : 2 }, { "token" : "一", "start_offset" : 12, "end_offset" : 13, "type" : "TYPE_CNUM", "position" : 3 }, { "token" : "起来", "start_offset" : 13, "end_offset" : 15, "type" : "CN_WORD", "position" : 4 }, { "token" : "起", "start_offset" : 13, "end_offset" : 14, "type" : "COUNT", "position" : 5 }, { "token" : "来", "start_offset" : 14, "end_offset" : 15, "type" : "CN_CHAR", "position" : 6 }, { "token" : "熟悉", "start_offset" : 15, "end_offset" : 17, "type" : "CN_WORD", "position" : 7 }, { "token" : "文档", "start_offset" : 17, "end_offset" : 19, "type" : "CN_WORD", "position" : 8 }, { "token" : "相关", "start_offset" : 19, "end_offset" : 21, "type" : "CN_WORD", "position" : 9 }, { "token" : "的", "start_offset" : 21, "end_offset" : 22, "type" : "CN_CHAR", "position" : 10 }, { "token" : "操作", "start_offset" : 22, "end_offset" : 24, "type" : "CN_WORD", "position" : 11 }, { "token" : "apple", "start_offset" : 24, "end_offset" : 29, "type" : "ENGLISH", "position" : 12 }, { "token" : "banana", "start_offset" : 30, "end_offset" : 36, "type" : "ENGLISH", "position" : 13 } ] } 期望结果：原本是什么就分成什么，不要转换大小写，该怎么实现呢？请指教！

Jan 21 '21 06:01 hello-yajing

应该可以参考 https://github.com/medcl/elasticsearch-analysis-ik/issues/386 配置：enable_lowercase=false

Mar 30 '21 11:03 tcluzhe

analysis-ik analysis-ik copied to clipboard

IK分词器 中英文分词(区分大小写)

analysis-ik
analysis-ik copied to clipboard

IK分词器中英文分词(区分大小写)