lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Fix segment-specific TermInSetQuery rewrites thrashing caching policy

Open GovindBalaji-S-Glean opened this issue 1 month ago • 1 comments

Description

Fixes https://github.com/apache/lucene/issues/14986.

A TermInSetQuery with rewriteMethod = MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE creates a RewritingWeight. Getting a scorer from this RewritingWeight for a segment could involve rewriting to a BooleanQuery of multiple TermQuery with only the terms present in that particular segment.

These segment-specific BooleanQuery rewrites all thrash the UsageTrackingQueryCachingPolicy ring buffer, which is shared across all segments of the index. The expectation is that we mark a query only once per shard in this ring buffer - ref

In this change: When initializing AbstractMultiTermQueryConstantScoreWrapper.RewritingWeight, we copy the supplied indexSearcher but with setQueryCache(null) and pass it along for the segment-specific rewrites. The subsequent rewrites into BooleanQuery of TermQuery don't go through the query cache (The idea is that these should be cached as the parent TermInSet query itself)

GovindBalaji-S-Glean avatar Nov 20 '25 10:11 GovindBalaji-S-Glean

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Dec 05 '25 00:12 github-actions[bot]