analysis-pinyin
analysis-pinyin copied to clipboard
拼音分词返回offset有问题
GET _analyze?pretty { "tokenizer": { "type": "pinyin", "keep_first_letter": false, "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_none_chinese_in_first_letter": false, "keep_none_chinese_in_joined_full_pinyin": true, "keep_none_chinese": false, "lowercase": true }, "text": "高墙" }
{ "tokens": [ { "token": "gaoqiang", "start_offset": 0, "end_offset": 8, "type": "word", "position": 0 } ] }
问题: end_offset不应该为8
@medcl ,呼唤大神。
已经修复。取end_offset为中文长度2
拉取master分支最新代码,还是不对。 { "tokens": [ { "token": "gaoqiang", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 } ] }
ignore_pinyin_offset=false,就有了
@blueshen 建立索引有报错,
`
PUT my_index { "settings": { "analysis": { "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin" } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "ignore_pinyin_offset" : false } } } }, "mappings": { "my_doc": { "properties": { "name": { "type": "text", "fields": { "pinyin": { "type": "text", "analyzer": "pinyin_analyzer" } } } } } } }
PUT my_index/my_doc/1 { "name": ["张三", "李四", "王五"] } `