Ryan19929
Ryan19929
Support IK tokenizer for inverted index: Migrate analysis-ik from Java to C++, Implement basic tokenization functionality. The major differences from the original Java code are as follows: 1. **Encoding Format...
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: This PR fixes the issue of IK Analyzer's abnormal handling of full-width characters and...
## Versions - [X] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [X] Chinese - [X] English ## Docs Checklist - [...
## 修复 PinyinTokenFilter 排序标志未重置的问题 ### 问题 使用 `keyword + pinyin filter` 组合时(包含pinyin filter即可),第一次分词结果与之后执行的结果顺序不一致。 - 对 [GT40] 分词 ``` "ignore_pinyin_offset": "true", "keep_first_letter": "false", "keep_none_chinese_in_joined_full_pinyin": "false", "keep_none_chinese_together": "true", "keep_original": "true", "limit_first_letter_length": 16,...
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ## Background In current Doris, storage medium (SSD/HDD) selection lacks fine-grained control: 1. Implicit...