lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Update EdgeNGramTokenizer.DEFAULT_MAX_NGRAM_SIZE to be practical

Open YeonghyeonKO opened this issue 5 months ago • 1 comments

issue : https://github.com/apache/lucene/issues/13802

  • Many libraries(git code: Elasticsearch, OpenSearch) based on Lucene use NGramTokenizer.DEFAULT_MAX_NGRAM_SIZE(=2) instead of EdgeNGramTokenizer's(=1) when configuring an EdgeNGramTokenizer.
  • By the above reason, it's NOT practical to keep sticking DEFAULT_MAX_NGRAM_SIZE of EdgeNGramTokenizer to be 1 so this PR changes it to be 2.
  • If it's necessary to explain this change, I'll add/change explanations by documentation.

YeonghyeonKO avatar Sep 20 '24 12:09 YeonghyeonKO