infinity
infinity copied to clipboard
[Feature Request]: Improve Chinese analyzer
Is there an existing issue for the same feature request?
- [X] I have checked the existing issues.
Describe the feature you'd like
Current Jieba analyzer for Chinese has several problems:
- Stopwords are supported through external dictionaries, therefore the eventual outputs do not have continious offsets which will affect phrase queries.
- For English tokens, stemmer is not used
- Query segmentation has smaller granularity which does not have a smart policy, it will affect ranking for Chinese text