lucene Stop duplicating per-segment work across segment partitions

With the introduction of intra-segment concurrency support (see #13542), the IndexSearcher is able to parallelize search across partitions of segments. Some of our queries, like PointRangeQuery, perform per-segment computation ahead of time, that is currently not shared across segment partitions, but rather duplicated across them.

This makes intra-segment concurrency not appealing for those queries, which we should look at addressing. If we can share such computation across segment partitions, we can likely enable intra-segment slicing by default and ensure that intra-segment concurrency is beneficial for all queries.

Sep 09 '24 18:09 javanna

This is a heads up that I started working on this. My focus is currently on PointRangeQuery. The overall goal is to share the bitset computation across scorer suppliers for the same leaf context. We would still create one scorer supplier per segment partition, and a new doc id set iterator, but all the rest would be shared across different partitions of the same segment.

Focusing on a single query is quite a simplification, which allows me to better understand the problem I am trying to solve. I do wonder how many other queries need this kind of treatment. I know there have been talks in the past about cloning scorers and such, but I wonder if that can be avoided by adapting the queries / weight impls that require attention instead.

Could anyone help me come up with a list of queries that need to be modified to de-duplicate per-segment work in order to optimize their support for intra-segment concurrency? Do we actually need a very generic approach for this problem, or is it reasonable to go case by case perhaps?

Feb 25 '25 15:02 javanna

HNSW vector search heavy lifting is done in rewrite, so out of scope for this, right? Maybe multi-term queries would need to do some work. What about join queries? TermInSet query?

Mar 04 '25 18:03 msokolov

HNSW vector search heavy lifting is done in rewrite, so out of scope for this, right?

I believe so, mostly because query rewrite does not parallelize on slices, but across segments directly. See #13952 .

Maybe multi-term queries would need to do some work. What about join queries? TermInSet query?

Yep, TermInSet was on my radar. The others you mentioned, I will take a look at. Thanks!

Mar 04 '25 23:03 javanna

@javanna not sure about the underlying implementation of these queries. Does using intra-segment partitioning negatively impact these queries (TermInSet/MultiTerm) or is there still some benefit from using the feature?

Apr 22 '25 21:04 Shibi-bala

Does using intra-segment partitioning negatively impact these queries (TermInSet/MultiTerm) or is there still some benefit from using the feature?

I guess it depends on data and query, but it may be that the per-segment computation is heavier than running the query, hence duplicating it would make the query overall slower even if executed across multiple threads.

Apr 29 '25 10:04 javanna

Will go over all types of queries to check ( other than PointRangeQuery ) that needs special handling by sharing the docId space unless someone has already covered it.

Jul 11 '25 02:07 expani

For reference, this is the ahead of time work that currently gets duplicated in PointRangeQuery across partitions of the same segments: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L313 . We create a bitset of the size of the entire segment. We should look into doing this once per segment and reusing it across partitions of the same segment instead.

Something similar happens in PointInSetQuery too, when pulling the scorer from the scorer supplier.

Jul 11 '25 12:07 javanna

Hi @javanna, is there a known list of the queries that have duplicated per-segment computation?

I reviewed the benchmarks you ran in PR #13542 with intra-segment parallelism enabled. Based off those results, it looks like these queries showed regressions:

CommonTermsQuery (i.e. HighTerm)
TermQuery (i.e. CountTerm)
BooleanQuery (i.e. OrHighLow)
PrefixQuery (i.e. Prefix3)
WildcardQuery (i.e. Wildcard)
FuzzyQuery (i.e. Fuzzy1)
MatchAllDocsQuery (i.e. BrowseDateSSDVFacets)

Granted, it's possible that the regressions stem from underlying issues rather than the top-level query itself. For example:

The regression in OrHighLow might be due to problems in CommonTermsQuery
PrefixQuery, WildCardQuery, and FuzzyQuery all rely on MultiTermQuery, so the root cause could be from MultiTermQuery#CONSTANT_SCORE_BLENDED_REWRITE (see https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreBlendedWrapper.java#L60)

Does this generally align with your findings so far?

Sep 09 '25 23:09 smuching202

@smuching202 the two I remember clearly about are PointRangeQuery and PointInSetQuery. These build a bitset ahead of time of the size of the entire segments. There may be others but I'd need to look closely at the code. It feels like once we solved one we have a solution for all potentially.

Sep 15 '25 12:09 javanna

Jumping in as this interests me, sorry if I get some points wrong!

It seems that PointInSetIncludingScoreQuery creates also a bitset from the scan of PointValues in createWeight.

https://github.com/apache/lucene/blob/085a0ae054b1df9623afce47e28b65c52184ce48/lucene/join/src/java/org/apache/lucene/search/join/PointInSetIncludingScoreQuery.java#L203-L205

Several queries like LatLonPointDistanceQuery also access PointValues but do not create upfront a bitset (as far as I can tell).

https://github.com/apache/lucene/blob/085a0ae054b1df9623afce47e28b65c52184ce48/lucene/core/src/java/org/apache/lucene/document/LatLonPointDistanceQuery.java#L138-L139

However, it looks like a bitset for the whole segment is computed nonetheless.

https://github.com/apache/lucene/blob/085a0ae054b1df9623afce47e28b65c52184ce48/lucene/core/src/java/org/apache/lucene/document/LatLonPointDistanceQuery.java#L153-L156

Sep 17 '25 14:09 scampi

Also correct me if I'm wrong, but I believe we may be duplicating allocations in TopFieldCollectorManager as we allocate a queue of size numHits for every collector, but we may be calling a collector multiple times on the same segment right? https://github.com/apache/lucene/blob/branch_10_2/lucene/core/src/java/org/apache/lucene/search/TopFieldCollectorManager.java#L137-L173. Specifically through FieldValueHitQueue#create which mentions "NOTE: The instances returned by this method pre-allocate a full array of length numHits."

Oct 02 '25 23:10 smuching202

lucene lucene copied to clipboard

Stop duplicating per-segment work across segment partitions

lucene
lucene copied to clipboard