tantivy
tantivy copied to clipboard
Implement "minimum number should match" on BooleanQuery
In a booleanquery, it can be useful to indicate that we want any 2 out of 3 terms to match. In lucene, this is possible by setting https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/search/BooleanQuery.Builder.html#setMinimumNumberShouldMatch-int-
Regardless of the implementation, it will need to have no impact over the performance of the existing union queries.
I'm willing to implement this feature. However, my familiarity with this project is limited. Here are my initial considerations:
- The
minimum_should_matchfunctionality operates as an additional constraint on BooleanQuery. It appears that the existingUnionScorercannot fit it. A novel scorerMinimumRequirementScorerbased onUnionScorerneeds to be developed, which rejectsDocmissing conditions.minimum_should_matchwill be passed as one of it's member. - To minimize performance impacts, the
MinimumRequirementScorer,UnionScorer, andRequiredOptionalScorerwill be employed under differentBooleanQuerycombinations.
Advices wanted. 😄
@LebranceBW I agree with both points.
The BooleanQuery -> BooleanWeight -> BooleanScorer are precisely here so that while users define their query by instanting a Query object, this query can be at runtime converted into different Scorer.
Thanks @LebranceBW https://github.com/quickwit-oss/tantivy/pull/2405