infinity icon indicating copy to clipboard operation
infinity copied to clipboard

[Feature Request]: Improve Chinese analyzer

Open yingfeng opened this issue 8 months ago • 0 comments

Is there an existing issue for the same feature request?

  • [X] I have checked the existing issues.

Describe the feature you'd like

Current Jieba analyzer for Chinese has several problems:

  1. Stopwords are supported through external dictionaries, therefore the eventual outputs do not have continious offsets which will affect phrase queries.
  2. For English tokens, stemmer is not used
  3. Query segmentation has smaller granularity which does not have a smart policy, it will affect ranking for Chinese text

yingfeng avatar Jun 08 '24 14:06 yingfeng