analysis-pinyin icon indicating copy to clipboard operation
analysis-pinyin copied to clipboard

拼音分词返回offset有问题

Open boling-wang opened this issue 6 years ago • 5 comments

GET _analyze?pretty { "tokenizer": { "type": "pinyin", "keep_first_letter": false, "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_none_chinese_in_first_letter": false, "keep_none_chinese_in_joined_full_pinyin": true, "keep_none_chinese": false, "lowercase": true }, "text": "高墙" }

{ "tokens": [ { "token": "gaoqiang", "start_offset": 0, "end_offset": 8, "type": "word", "position": 0 } ] }

问题: end_offset不应该为8

boling-wang avatar Jul 17 '18 09:07 boling-wang

@medcl ,呼唤大神。

boling-wang avatar Jul 21 '18 06:07 boling-wang

已经修复。取end_offset为中文长度2

blueshen avatar Oct 30 '18 03:10 blueshen

拉取master分支最新代码,还是不对。 { "tokens": [ { "token": "gaoqiang", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 } ] }

boling-wang avatar Dec 20 '18 05:12 boling-wang

ignore_pinyin_offset=false,就有了

blueshen avatar Dec 21 '18 02:12 blueshen

@blueshen 建立索引有报错,

`

PUT my_index { "settings": { "analysis": { "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin" } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "ignore_pinyin_offset" : false } } } }, "mappings": { "my_doc": { "properties": { "name": { "type": "text", "fields": { "pinyin": { "type": "text", "analyzer": "pinyin_analyzer" } } } } } } }

PUT my_index/my_doc/1 { "name": ["张三", "李四", "王五"] } `

boling-wang avatar Jan 16 '19 03:01 boling-wang