lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Fix MaxScoreBulkScorer leaf-bound overshoot and prevent merging zero-score fragments

Open kdt523 opened this issue 2 months ago • 1 comments

This PR delivers two minimal, targeted fixes with regression tests:

Core: Prevent MaxScoreBulkScorer from advancing past a leaf’s maxDoc under filtered disjunctions (avoids potential EOF when norms are accessed after NO_MORE_DOCS). Highlighter: Don’t merge zero-scored fragments (GH-15333) to avoid producing merged passages that include content with no matches.

Motivation MaxScoreBulkScorer: With a restrictive filter plus a disjunction, the candidate windowing logic could overshoot a segment’s maxDoc. If norms were accessed after NO_MORE_DOCS, this could trigger unexpected EOF. Highlighter: Zero-score fragments should not be merged with adjacent fragments, otherwise the final passage can include unrelated content with no matches.

Changes Core (lucene/core) Clamp candidate advancement at the leaf boundary in MaxScoreBulkScorer (e.g., within nextCandidate) so NO_MORE_DOCS is returned when rangeEnd exceeds maxDoc. Added regression test: org.apache.lucene.search.TestMaxScoreBulkScorerFilterBounds. Highlighter (lucene/highlighter) In Highlighter, filter out zero-scored TextFragments before mergeContiguousFragments to prevent unintended merges. Added regression test: org.apache.lucene.search.highlight.TestZeroScoreMerging. Docs Updated [CHANGES.txt] with both fixes and referenced test names.

Testing New tests: lucene/core: TestMaxScoreBulkScorerFilterBounds validates filtered-disjunction execution does not score past maxDoc and does not throw. lucene/highlighter: TestZeroScoreMerging ensures zero-score fragments aren’t merged. Both tests pass locally in isolation for their respective modules.

Backwards compatibility Behavior is strictly safer/more correct: Core: Prevents out-of-bounds progression; no API changes. Highlighter: Merge semantics exclude fragments with score == 0; expected/intuitive behavior, no API changes.

Performance Neutral. The core change is a simple bound check in the candidate advancement logic. Highlighter change is a small pre-filter on fragments.

Risk Low. Changes are localized and covered by focused regression tests. Related Fix: #15333

kdt523 avatar Oct 21 '25 19:10 kdt523

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Nov 14 '25 00:11 github-actions[bot]