tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Implement "minimum number should match" on BooleanQuery

Open fulmicoton opened this issue 1 year ago • 2 comments

In a booleanquery, it can be useful to indicate that we want any 2 out of 3 terms to match. In lucene, this is possible by setting https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/search/BooleanQuery.Builder.html#setMinimumNumberShouldMatch-int-

Regardless of the implementation, it will need to have no impact over the performance of the existing union queries.

fulmicoton avatar May 13 '24 09:05 fulmicoton

I'm willing to implement this feature. However, my familiarity with this project is limited. Here are my initial considerations:

  • The minimum_should_match functionality operates as an additional constraint on BooleanQuery. It appears that the existing UnionScorer cannot fit it. A novel scorer MinimumRequirementScorer based on UnionScorer needs to be developed, which rejects Doc missing conditions. minimum_should_match will be passed as one of it's member.
  • To minimize performance impacts, the MinimumRequirementScorer, UnionScorer, and RequiredOptionalScorer will be employed under different BooleanQuery combinations.

Advices wanted. 😄

LebranceBW avatar May 16 '24 03:05 LebranceBW

@LebranceBW I agree with both points.

The BooleanQuery -> BooleanWeight -> BooleanScorer are precisely here so that while users define their query by instanting a Query object, this query can be at runtime converted into different Scorer.

fulmicoton avatar May 16 '24 04:05 fulmicoton

Thanks @LebranceBW https://github.com/quickwit-oss/tantivy/pull/2405

PSeitz avatar Jul 01 '24 10:07 PSeitz