Optimize `PointRangeQuery` for intra-segment concurrency with segment-level `DocIdSet` caching
Description
This PR optimizes PointRangeQuery to efficiently support intra-segment concurrent search by implementing segment-level DocIdSet caching. When a large segment is split into multiple partitions for parallel processing, all partitions now share a single BKD tree traversal result instead of each partition performing redundant traversals. The solution was derived as part of discussion from this PR https://github.com/apache/lucene/pull/15383. Related issue for PointRangeQuery with https://github.com/apache/lucene/issues/13745 intra-segment.
Problem
With intra-segment concurrency enabled, a single segment can be split into multiple partitions, each processed by a different thread. In the current implementation, each partition independently traverses the BKD tree and builds its own DocIdSet, resulting in Query latency https://github.com/apache/lucene/pull/13542#issuecomment-2332114836 and redundant/duplicate BKD crawl.
Solution
Implement a segment level cache that ensures the BKD tree is traversed only once per segment, with the resulting DocIdSet shared across all partitions:
-
SegmentDocIdSetSupplier: A new helper class that lazily builds and caches the
DocIdSetfor an entire segment. -
Segment-level cache: A
ConcurrentHashMap<LeafReaderContext, SegmentDocIdSetSupplier>in theWeightthat ensures all partitions of the same segment share the same supplier. -
PartitionScorerSupplier: A new
ScorerSupplierimplementation that references the shared cache and filters results to the partition's doc ID range. -
PartitionFilteredDocIdSetIterator: A lightweight iterator wrapper that filters the shared full-segment
DocIdSetto only return docs within the partition's range. -
Pending once need to update the
cost()methods right and add the tests along with some code cleanup. Here are some local testing details https://github.com/apache/lucene/pull/15446#issuecomment-3568992048. -
The behavior is same when intra-segment is disabled, handled in existing
scorerSupplier(LeafReaderContext context)method.
Performance Impact: Seen good improvement with IntNRQ
Tested with enabling intra-segment on both candidate and baseline.
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Respell 21.82 (11.8%) 19.72 (7.4%) -9.6% ( -25% - 10%) 0.123
BrowseDayOfYearSSDVFacets 3.14 (18.6%) 3.01 (12.2%) -4.0% ( -29% - 32%) 0.688
HighTermTitleBDVSort 10.51 (3.7%) 10.11 (4.5%) -3.7% ( -11% - 4%) 0.152
MedIntervalsOrdered 36.53 (7.0%) 35.68 (6.3%) -2.3% ( -14% - 11%) 0.576
OrNotHighLow 435.41 (3.5%) 425.81 (1.3%) -2.2% ( -6% - 2%) 0.192
AndHighMedDayTaxoFacets 63.83 (1.8%) 62.45 (4.0%) -2.2% ( -7% - 3%) 0.271
HighTermMonthSort 207.85 (4.6%) 203.35 (4.0%) -2.2% ( -10% - 6%) 0.431
OrNotHighHigh 36.19 (16.2%) 35.57 (11.9%) -1.7% ( -25% - 31%) 0.849
BrowseMonthSSDVFacets 3.12 (9.4%) 3.07 (13.6%) -1.5% ( -22% - 23%) 0.839
HighPhrase 22.87 (2.6%) 22.58 (2.6%) -1.3% ( -6% - 3%) 0.437
HighTermDayOfYearSort 25.58 (2.4%) 25.26 (3.1%) -1.3% ( -6% - 4%) 0.477
LowSpanNear 17.70 (3.0%) 17.53 (3.1%) -1.0% ( -6% - 5%) 0.617
HighSloppyPhrase 9.39 (2.8%) 9.32 (2.8%) -0.7% ( -6% - 4%) 0.674
HighSpanNear 14.31 (3.7%) 14.21 (0.7%) -0.7% ( -4% - 3%) 0.692
OrHighNotLow 148.21 (12.2%) 147.26 (11.5%) -0.6% ( -21% - 26%) 0.932
TermDTSort 26.59 (4.0%) 26.43 (2.2%) -0.6% ( -6% - 5%) 0.760
OrHighNotHigh 50.27 (11.8%) 50.05 (12.1%) -0.4% ( -21% - 26%) 0.955
OrHighMed 111.27 (8.8%) 110.89 (8.6%) -0.3% ( -16% - 18%) 0.950
AndHighMed 210.73 (2.8%) 210.06 (1.9%) -0.3% ( -4% - 4%) 0.834
Wildcard 25.46 (1.5%) 25.40 (2.3%) -0.3% ( -3% - 3%) 0.835
OrNotHighMed 60.00 (14.7%) 59.88 (12.5%) -0.2% ( -23% - 31%) 0.981
BrowseDateSSDVFacets 0.53 (16.0%) 0.53 (18.6%) -0.1% ( -29% - 41%) 0.994
HighTerm 242.89 (7.4%) 243.56 (10.1%) 0.3% ( -16% - 19%) 0.961
range 2796.48 (7.4%) 2805.50 (3.1%) 0.3% ( -9% - 11%) 0.928
BrowseDayOfYearTaxoFacets 2.09 (7.3%) 2.09 (11.5%) 0.4% ( -17% - 20%) 0.952
OrHighHigh 35.14 (9.7%) 35.32 (12.6%) 0.5% ( -19% - 25%) 0.943
MedSloppyPhrase 11.84 (1.4%) 11.91 (3.9%) 0.6% ( -4% - 6%) 0.746
Prefix3 33.61 (3.1%) 33.84 (2.8%) 0.7% ( -5% - 6%) 0.717
LowIntervalsOrdered 99.26 (2.7%) 99.96 (3.8%) 0.7% ( -5% - 7%) 0.737
MedTermDayTaxoFacets 16.22 (6.0%) 16.35 (8.0%) 0.8% ( -12% - 15%) 0.859
HighIntervalsOrdered 2.98 (12.3%) 3.01 (8.0%) 0.8% ( -17% - 24%) 0.897
IntSet 140.75 (4.4%) 142.36 (5.8%) 1.1% ( -8% - 11%) 0.726
AndHighHighDayTaxoFacets 12.74 (5.4%) 12.90 (3.5%) 1.3% ( -7% - 10%) 0.647
HighTermTitleSort 14.03 (1.3%) 14.22 (2.6%) 1.3% ( -2% - 5%) 0.295
BrowseRandomLabelTaxoFacets 1.72 (5.8%) 1.75 (4.9%) 1.4% ( -8% - 12%) 0.688
OrHighMedDayTaxoFacets 1.15 (3.8%) 1.17 (5.0%) 1.5% ( -6% - 10%) 0.580
MedPhrase 58.94 (4.5%) 59.93 (4.7%) 1.7% ( -7% - 11%) 0.561
Fuzzy1 34.72 (7.2%) 35.33 (5.7%) 1.8% ( -10% - 15%) 0.670
AndHighLow 531.18 (1.7%) 541.24 (6.5%) 1.9% ( -6% - 10%) 0.527
BrowseMonthTaxoFacets 2.17 (8.9%) 2.21 (12.3%) 2.0% ( -17% - 25%) 0.772
LowPhrase 12.05 (3.8%) 12.34 (3.4%) 2.4% ( -4% - 9%) 0.294
LowSloppyPhrase 16.99 (2.2%) 17.40 (3.2%) 2.4% ( -2% - 7%) 0.162
LowTerm 459.98 (16.0%) 472.15 (17.4%) 2.6% ( -26% - 42%) 0.803
BrowseRandomLabelSSDVFacets 2.12 (8.3%) 2.18 (13.3%) 2.9% ( -17% - 26%) 0.678
MedSpanNear 4.41 (5.3%) 4.54 (7.2%) 3.1% ( -8% - 16%) 0.434
AndHighHigh 48.99 (9.9%) 50.53 (11.9%) 3.1% ( -17% - 27%) 0.650
OrHighLow 342.27 (6.5%) 353.27 (3.7%) 3.2% ( -6% - 14%) 0.334
OrHighNotMed 117.56 (12.8%) 122.42 (11.0%) 4.1% ( -17% - 31%) 0.582
MedTerm 273.79 (15.1%) 285.16 (13.9%) 4.2% ( -21% - 39%) 0.652
Fuzzy2 38.07 (9.0%) 39.69 (11.0%) 4.2% ( -14% - 26%) 0.502
PKLookup 139.01 (11.7%) 146.39 (7.1%) 5.3% ( -12% - 27%) 0.386
BrowseDateTaxoFacets 2.02 (5.9%) 2.16 (11.7%) 7.1% ( -9% - 26%) 0.228
IntNRQ 12.30 (3.8%) 30.18 (8.2%) 145.3% ( 128% - 163%) 0.000
Related Issues
- https://github.com/apache/lucene/issues/13745
- ~https://github.com/apache/lucene/issues/14485
Before I add some tests, tested this behavior using https://github.com/msfroh/lucene-university (will check in the code here as well). Notice in the following logs:
- A segment is divided into 5 partitions and part of 5 different slices.
- Score supplier is called by all partitions for a the same segment
ctx identity: 857068247. - All 5 threads get same supplier
called on supplier #1557216666291(SegmentDocIdSetSupplier) done by thread 41 from partition [400000, 800000) - All partitions share same cache entry
supplier identity: 1536099041(same for all 5). - BKD traversal happens only ONCE
[BUILD_START]on thread 39,[BUILD_SKIP]on 4 other threads, so only 1 thread builds the DocIdSet, the other 4 threads reuse the cached result.
> Task :example.points.IntraSegmentPointRangeTest.main()
=== Intra-Segment Point Range Query Test ===
Step 1: Indexing documents...
Indexing 2000000 documents...
Indexed 500000 documents...
Indexed 1000000 documents...
Indexed 1500000 documents...
Force merging to single segment...
Indexing complete!
Step 2: Opening reader and creating searcher...
Index info:
Total docs: 2000000
Number of segments: 1
Segment 0: 2000000 docs
Creating IndexSearcher with 4 threads
=== Slice Information ===
Number of slices: 5
Slice 0:
Number of partitions: 1
Total docs in slice: 400000
Partition 0:
Segment: 0
Doc range: [0, 400000)
Doc count: 400000
Slice 1:
Number of partitions: 1
Total docs in slice: 400000
Partition 0:
Segment: 0
Doc range: [400000, 800000)
Doc count: 400000
Slice 2:
Number of partitions: 1
Total docs in slice: 400000
Partition 0:
Segment: 0
Doc range: [800000, 1200000)
Doc count: 400000
Slice 3:
Number of partitions: 1
Total docs in slice: 400000
Partition 0:
Segment: 0
Doc range: [1200000, 1600000)
Doc count: 400000
Slice 4:
Number of partitions: 1
Total docs in slice: 400000
Partition 0:
Segment: 0
Doc range: [1600000, 2000000)
Doc count: 400000
Step 3: Executing range query...
Query: value:[0 TO 1499999]
Expected matches: 1500000
Searching (multi-threaded)...
=== Multi-threaded Search Results ===
Total hits: 1500000
Time: 29ms
=== Verification ===
Expected: 1500000
Actual: 1500000
Result: ✓ CORRECT
=== Sample Results (Top 10) ===
Nov 23, 2025 5:12:50 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
[SCORER_SUPPLIER] Called for segment 0 partition [0, 400000) on thread 3 ctx identity: 857068247
[SCORER_SUPPLIER] Called for segment 0 partition [400000, 800000) on thread 41 ctx identity: 857068247
[SCORER_SUPPLIER] Called for segment 0 partition [1200000, 1600000) on thread 39 ctx identity: 857068247
[CACHE_LOOKUP] Before computeIfAbsent, cache size: 0
[CACHE_LOOKUP] Before computeIfAbsent, cache size: 0
[SCORER_SUPPLIER] Called for segment 0 partition [800000, 1200000) on thread 40 ctx identity: 857068247
[CACHE_LOOKUP] Before computeIfAbsent, cache size: 0
[SCORER_SUPPLIER] Called for segment 0 partition [1600000, 2000000) on thread 38 ctx identity: 857068247
[CACHE_LOOKUP] Before computeIfAbsent, cache size: 0
[CACHE_LOOKUP] Before computeIfAbsent, cache size: 0
[CACHE_MISS] CREATING new SegmentDocIdSetSupplier for segment 0 on thread 41
[SUPPLIER_CREATED] SegmentDocIdSetSupplier #1557216666291 for segment 0
[CACHE_RESULT] After computeIfAbsent, cache size: 1, supplier identity: 1536099041
[CACHE_RESULT] After computeIfAbsent, cache size: 1, supplier identity: 1536099041
[CACHE_RESULT] After computeIfAbsent, cache size: 1, supplier identity: 1536099041
[CACHE_RESULT] After computeIfAbsent, cache size: 1, supplier identity: 1536099041
[CACHE_RESULT] After computeIfAbsent, cache size: 1, supplier identity: 1536099041
[GET_OR_BUILD] Called on supplier #1557216666291 for segment 0 on thread 39
[BUILD_CHECK] cachedDocIdSet is null, entering synchronized block
[BUILD_START] Building DocIdSet for segment 0 on thread 39
[GET_OR_BUILD] Called on supplier #1557216666291 for segment 0 on thread 38
[BUILD_CHECK] cachedDocIdSet is null, entering synchronized block
[GET_OR_BUILD] Called on supplier #1557216666291 for segment 0 on thread 3
[BUILD_CHECK] cachedDocIdSet is null, entering synchronized block
[GET_OR_BUILD] Called on supplier #1557216666291 for segment 0 on thread 40
[BUILD_CHECK] cachedDocIdSet is null, entering synchronized block
[GET_OR_BUILD] Called on supplier #1557216666291 for segment 0 on thread 41
[BUILD_CHECK] cachedDocIdSet is null, entering synchronized block
Disconnected from the target VM, address: 'localhost:55600', transport: 'socket'
[BUILD_COMPLETE] Built DocIdSet for segment 0 in 901ms
[BUILD_SKIP] Another thread already built DocIdSet
[BUILD_SKIP] Another thread already built DocIdSet
[BUILD_SKIP] Another thread already built DocIdSet
[BUILD_SKIP] Another thread already built DocIdSet
Doc 0: value=0, score=1.0
Doc 1: value=1, score=1.0
Doc 2: value=2, score=1.0
Doc 3: value=3, score=1.0
Doc 4: value=4, score=1.0
Doc 5: value=5, score=1.0
Doc 6: value=6, score=1.0
Doc 7: value=7, score=1.0
Doc 8: value=8, score=1.0
Doc 9: value=9, score=1.0
=== Cleanup ===
Shutting down executor...
Done!
Visual Flow
LeafReaderContext object (ctx identity: 857068247)
↑ ↑ ↑ ↑ ↑
│ │ │ │ │
Partition Partition Partition Partition Partition
[0,400K) [400K,800K) [800K,1.2M) [1.2M,1.6M) [1.6M,2M)
Thread 3 Thread 41 Thread 40 Thread 39 Thread 38
Thread 41: [CACHE_MISS] Creates supplier ─────────────┐
Thread 39: [CACHE_RESULT] Gets supplier ──┐ │
Thread 38: [CACHE_RESULT] Gets supplier ──┤ │
Thread 3: [CACHE_RESULT] Gets supplier ──┤ │
Thread 40: [CACHE_RESULT] Gets supplier ──┘ │
│ │
↓ │
All 5 threads have │
same supplier │
│ │
↓ ↓
Thread 39: [BUILD_START] ← BUILDS the DocIdSet
Thread 38: [BUILD_SKIP] ← Waits, then reuses the DocIdSet
Thread 3: [BUILD_SKIP] ← Waits, then reuses the DocIdSet
Thread 40: [BUILD_SKIP] ← Waits, then reuses the DocIdSet
Thread 41: [BUILD_SKIP] ← Waits, then reuses the DocIdSet (even though it created supplier!)
Looks like flaky test ?
./gradlew :lucene:join:test --tests "org.apache.lucene.search.join.TestBlockJoin.testScoreMode" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.gui=true -Ptests.jvmargs= -Ptests.jvms=4 -Ptests.seed=3014C2CB4BB8490 -Ptests.vectorsize=512
I am having a hard time understanding why this PR is improving the query throughput of IntNRQ. Mi expectation is that the query expends most of the time traversing the BKD tree and very little time building the result. As this PR still traverses the BKD tree with one thread, I would expect very little change in the query latency. I did make a local test with one of my favourite datasets and I did not see any change on latency as expected.
More over, I would expect query throughput to be hurt by this change because all those blocked search threads doing no work, so concurrent queries will be running with less resources. Do you happen to know why QPS is improving? I might be missing something.
This idea was inspired from comments https://github.com/apache/lucene/issues/13745#issuecomment-3062037144 and https://github.com/apache/lucene/pull/15383#issuecomment-3533814898.
Do you happen to know why QPS is improving?
The idea is instead of doing multiple same BKD traversal when divided into multiple partitions, do one BKD traversal per segment and share the DocIdSet to iterate over the partition specific documents.
I did make a local test with one of my favourite datasets and I did not see any change on latency as expected.
I assume you enabled intra segment is both cases ?
I would expect very little change in the query latency. I did make a local test with one of my favourite datasets and I did not see any change on latency as expected.
May I know if I could test that on my local as well ? For now I used https://github.com/mikemccand/luceneutil wikimediumall.
I assume you enabled intra segment is both cases ?
No, in the baseline I used current main with segment only concurrency. The candidate is this patch.
May I know if I could test that on my local as well ?
I test with the datasets used for lucene geospatial benchmarks: https://benchmarks.mikemccandless.com/geobench.html I merged the index to one segment and used the bounding box query which uses the PointRangeQuery.
No, in the baseline I used current main with segment only concurrency. The candidate is this patch.
Can you test with both intra segment enabled (in this patch from this PR the intra segment is already enabled). FYI here is the past Intra segment search benchmarks from Lucene: https://github.com/apache/lucene/pull/13542#issuecomment-2332114836
I test with the datasets used for lucene geospatial benchmarks: https://benchmarks.mikemccandless.com/geobench.html I merged the index to one segment and used the bounding box query which uses the PointRangeQuery.
I see the same issue https://github.com/mikemccand/luceneutil/issues/372#issue-3005741642 when I want to run the geo benchmark. Let me see if I can still test the geospatial benchmarks with one segment and bounding box query.
No need, now I understand the results you are providing. I think you should provide the comparison with main for completeness (e.g is this solution competitive with the current status quo).
No need, now I understand the results you are providing. I think you should provide the comparison with main for completeness (e.g is this solution competitive with the current status quo).
Thanks! Below are the results without enabling intra-segment search (on both lucene_candidate and lucene_baseline), which reflects the current behavior on main.
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseDateSSDVFacets 0.56 (16.1%) 0.50 (1.1%) -11.0% ( -24% - 7%) 0.337
OrHighNotLow 221.92 (9.6%) 198.50 (10.8%) -10.6% ( -28% - 10%) 0.301
OrHighNotHigh 99.49 (8.9%) 90.42 (3.9%) -9.1% ( -20% - 4%) 0.186
AndHighMedDayTaxoFacets 27.38 (3.2%) 25.24 (2.4%) -7.8% ( -13% - -2%) 0.006
MedTerm 352.99 (3.0%) 331.70 (2.7%) -6.0% ( -11% - 0%) 0.036
HighTermTitleSort 20.00 (2.1%) 19.13 (2.7%) -4.3% ( -8% - 0%) 0.076
OrHighLow 354.88 (14.1%) 341.84 (7.3%) -3.7% ( -22% - 20%) 0.744
range 1531.41 (3.8%) 1478.39 (5.7%) -3.5% ( -12% - 6%) 0.476
HighSloppyPhrase 9.15 (1.0%) 8.88 (1.3%) -3.0% ( -5% - 0%) 0.009
MedTermDayTaxoFacets 7.75 (13.9%) 7.58 (9.0%) -2.2% ( -22% - 24%) 0.852
BrowseDayOfYearSSDVFacets 2.88 (34.4%) 2.83 (15.4%) -1.8% ( -38% - 73%) 0.945
AndHighLow 532.13 (1.7%) 526.10 (3.4%) -1.1% ( -6% - 4%) 0.674
AndHighHigh 39.57 (6.2%) 39.20 (1.3%) -0.9% ( -7% - 7%) 0.834
OrHighHigh 63.20 (2.8%) 62.78 (2.9%) -0.7% ( -6% - 5%) 0.814
PKLookup 152.77 (2.4%) 152.01 (0.3%) -0.5% ( -3% - 2%) 0.770
HighTermDayOfYearSort 70.31 (0.4%) 70.05 (6.7%) -0.4% ( -7% - 6%) 0.939
LowSpanNear 12.96 (2.6%) 12.94 (0.7%) -0.1% ( -3% - 3%) 0.944
LowPhrase 23.84 (1.2%) 23.82 (1.1%) -0.1% ( -2% - 2%) 0.945
LowSloppyPhrase 7.93 (5.2%) 7.94 (0.2%) 0.2% ( -4% - 5%) 0.966
HighTermTitleBDVSort 9.53 (10.7%) 9.55 (9.3%) 0.2% ( -17% - 22%) 0.985
OrHighNotMed 162.09 (4.9%) 162.41 (1.2%) 0.2% ( -5% - 6%) 0.957
IntNRQ 48.11 (5.4%) 48.34 (5.0%) 0.5% ( -9% - 11%) 0.928
Wildcard 18.14 (0.5%) 18.25 (1.7%) 0.6% ( -1% - 2%) 0.654
BrowseMonthTaxoFacets 2.28 (0.3%) 2.30 (4.4%) 0.7% ( -3% - 5%) 0.829
HighSpanNear 5.88 (3.4%) 5.94 (5.2%) 1.1% ( -7% - 10%) 0.799
HighPhrase 11.88 (2.3%) 12.05 (3.8%) 1.5% ( -4% - 7%) 0.638
Fuzzy2 42.79 (0.7%) 43.45 (15.6%) 1.6% ( -14% - 17%) 0.888
LowTerm 462.46 (2.5%) 471.91 (5.4%) 2.0% ( -5% - 10%) 0.628
Prefix3 201.07 (4.2%) 205.62 (0.7%) 2.3% ( -2% - 7%) 0.454
OrNotHighHigh 192.75 (1.4%) 197.39 (3.8%) 2.4% ( -2% - 7%) 0.396
LowIntervalsOrdered 20.77 (3.2%) 21.28 (4.0%) 2.5% ( -4% - 9%) 0.499
AndHighMed 125.26 (6.8%) 128.38 (7.2%) 2.5% ( -10% - 17%) 0.722
OrHighMedDayTaxoFacets 5.56 (3.2%) 5.70 (1.1%) 2.5% ( -1% - 7%) 0.293
MedSpanNear 41.21 (2.4%) 42.62 (3.3%) 3.4% ( -2% - 9%) 0.239
AndHighHighDayTaxoFacets 3.89 (3.3%) 4.03 (0.8%) 3.4% ( 0% - 7%) 0.150
HighIntervalsOrdered 6.43 (0.4%) 6.69 (0.3%) 3.9% ( 3% - 4%) 0.000
BrowseRandomLabelTaxoFacets 1.72 (4.0%) 1.79 (1.9%) 4.1% ( -1% - 10%) 0.192
TermDTSort 76.67 (2.1%) 79.95 (3.8%) 4.3% ( -1% - 10%) 0.165
MedPhrase 37.46 (4.2%) 39.26 (0.9%) 4.8% ( 0% - 10%) 0.112
OrHighMed 170.61 (3.8%) 180.07 (8.8%) 5.5% ( -6% - 18%) 0.413
OrNotHighLow 406.32 (4.2%) 429.70 (0.8%) 5.8% ( 0% - 11%) 0.058
HighTerm 277.15 (1.1%) 293.83 (5.3%) 6.0% ( 0% - 12%) 0.118
HighTermMonthSort 440.76 (6.9%) 474.15 (1.8%) 7.6% ( -1% - 17%) 0.131
Fuzzy1 30.28 (8.7%) 32.72 (20.1%) 8.0% ( -19% - 40%) 0.604
MedSloppyPhrase 48.44 (5.4%) 52.45 (0.4%) 8.3% ( 2% - 14%) 0.030
BrowseDateTaxoFacets 2.14 (14.9%) 2.32 (9.4%) 8.5% ( -13% - 38%) 0.495
BrowseDayOfYearTaxoFacets 1.98 (2.2%) 2.15 (13.6%) 8.7% ( -6% - 24%) 0.375
MedIntervalsOrdered 1.63 (2.2%) 1.79 (2.1%) 9.8% ( 5% - 14%) 0.000
Respell 27.76 (8.9%) 30.85 (6.1%) 11.1% ( -3% - 28%) 0.144
IntSet 262.93 (6.0%) 292.36 (0.3%) 11.2% ( 4% - 18%) 0.009
OrNotHighMed 134.43 (20.1%) 149.87 (9.4%) 11.5% ( -15% - 51%) 0.465
BrowseRandomLabelSSDVFacets 1.84 (12.0%) 2.07 (0.2%) 12.4% ( 0% - 28%) 0.144
BrowseMonthSSDVFacets 3.06 (21.2%) 3.98 (71.7%) 30.0% ( -51% - 155%) 0.570
My approach is to improve PointRangeQuery performance when intra-segment search is enabled, as part of stabilizing the intra-segment work and eliminate per-segment work across segment partitions
Thanks! Below are the results without enabling intra-segment search (on both
lucene_candidateandlucene_baseline), which reflects the current behavior onmain.
That's not what I meant, I wanted this PR with intra segment search with the current main in order to answer the question what are the benefits of using this against current main?
Enable intra segment on candidate, baseline using main disabled intra segment (the following run is without this PR optimization)
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
IntNRQ 58.91 (1.9%) 13.90 (0.3%) -76.4% ( -77% - -75%) 0.000
Wildcard 43.88 (4.0%) 10.57 (1.0%) -75.9% ( -77% - -73%) 0.000
Prefix3 281.97 (7.8%) 76.48 (1.1%) -72.9% ( -75% - -69%) 0.000
OrHighNotMed 311.22 (6.4%) 110.80 (2.9%) -64.4% ( -69% - -58%) 0.000
OrHighNotHigh 148.47 (10.9%) 53.37 (2.5%) -64.1% ( -69% - -56%) 0.000
HighTermMonthSort 519.10 (5.2%) 196.53 (1.8%) -62.1% ( -65% - -58%) 0.000
HighPhrase 19.04 (2.7%) 8.24 (1.6%) -56.7% ( -59% - -53%) 0.000
HighTermDayOfYearSort 76.20 (4.7%) 33.62 (1.3%) -55.9% ( -59% - -52%) 0.000
HighTerm 342.27 (11.9%) 159.15 (9.2%) -53.5% ( -66% - -36%) 0.000
IntSet 242.49 (6.3%) 138.24 (1.2%) -43.0% ( -47% - -37%) 0.000
LowTerm 438.69 (15.4%) 265.03 (12.7%) -39.6% ( -58% - -13%) 0.000
HighTermTitleSort 20.93 (3.3%) 12.79 (0.6%) -38.9% ( -41% - -36%) 0.000
OrNotHighHigh 237.69 (11.0%) 151.04 (2.9%) -36.5% ( -45% - -25%) 0.000
MedTerm 456.48 (11.3%) 313.47 (11.4%) -31.3% ( -48% - -9%) 0.000
TermDTSort 43.22 (6.8%) 31.36 (1.6%) -27.5% ( -33% - -20%) 0.000
OrHighMed 193.23 (9.7%) 147.57 (7.6%) -23.6% ( -37% - -7%) 0.000
OrNotHighMed 146.14 (6.9%) 113.02 (4.3%) -22.7% ( -31% - -12%) 0.000
Fuzzy1 48.10 (10.0%) 38.63 (5.6%) -19.7% ( -32% - -4%) 0.000
MedPhrase 113.33 (2.2%) 92.06 (2.2%) -18.8% ( -22% - -14%) 0.000
OrNotHighLow 547.98 (3.6%) 446.60 (3.9%) -18.5% ( -25% - -11%) 0.000
AndHighLow 535.11 (5.4%) 440.58 (3.0%) -17.7% ( -24% - -9%) 0.000
OrHighLow 414.70 (5.7%) 342.19 (4.4%) -17.5% ( -26% - -7%) 0.000
AndHighMedDayTaxoFacets 15.84 (4.7%) 13.24 (4.5%) -16.4% ( -24% - -7%) 0.000
AndHighHighDayTaxoFacets 16.50 (5.5%) 13.92 (1.0%) -15.6% ( -20% - -9%) 0.000
OrHighHigh 73.41 (13.6%) 62.44 (14.1%) -14.9% ( -37% - 14%) 0.087
BrowseDateSSDVFacets 0.62 (10.2%) 0.54 (10.0%) -13.5% ( -30% - 7%) 0.035
MedTermDayTaxoFacets 13.08 (6.8%) 11.35 (4.9%) -13.2% ( -23% - -1%) 0.000
BrowseDateTaxoFacets 2.31 (21.7%) 2.02 (4.8%) -12.8% ( -32% - 17%) 0.199
range 3310.16 (2.1%) 2956.59 (6.0%) -10.7% ( -18% - -2%) 0.000
MedSloppyPhrase 33.09 (1.7%) 30.04 (3.1%) -9.2% ( -13% - -4%) 0.000
LowPhrase 19.78 (2.4%) 18.15 (1.3%) -8.2% ( -11% - -4%) 0.000
HighSloppyPhrase 6.08 (3.3%) 5.58 (2.3%) -8.1% ( -13% - -2%) 0.000
OrHighMedDayTaxoFacets 5.88 (7.5%) 5.41 (4.9%) -8.1% ( -19% - 4%) 0.044
BrowseDayOfYearTaxoFacets 2.29 (17.9%) 2.13 (12.1%) -6.8% ( -31% - 28%) 0.479
Respell 24.52 (11.1%) 22.91 (14.1%) -6.6% ( -28% - 20%) 0.410
AndHighHigh 72.56 (9.5%) 68.81 (13.6%) -5.2% ( -25% - 19%) 0.486
BrowseRandomLabelTaxoFacets 1.80 (4.2%) 1.71 (5.4%) -5.1% ( -14% - 4%) 0.100
LowIntervalsOrdered 42.00 (2.0%) 40.00 (4.6%) -4.8% ( -11% - 1%) 0.034
HighTermTitleBDVSort 11.16 (5.0%) 10.64 (4.5%) -4.6% ( -13% - 5%) 0.123
HighIntervalsOrdered 10.10 (6.3%) 9.76 (6.6%) -3.4% ( -15% - 10%) 0.411
Fuzzy2 35.54 (4.9%) 34.46 (7.0%) -3.0% ( -14% - 9%) 0.429
OrHighNotLow 371.12 (9.4%) 363.35 (8.9%) -2.1% ( -18% - 17%) 0.717
MedSpanNear 109.73 (1.9%) 107.88 (2.9%) -1.7% ( -6% - 3%) 0.281
HighSpanNear 5.56 (3.9%) 5.68 (3.8%) 2.3% ( -5% - 10%) 0.353
BrowseMonthTaxoFacets 2.21 (11.4%) 2.27 (5.3%) 2.5% ( -12% - 21%) 0.651
BrowseRandomLabelSSDVFacets 1.97 (9.5%) 2.04 (13.3%) 3.4% ( -17% - 28%) 0.639
LowSpanNear 6.17 (4.4%) 6.42 (4.9%) 4.2% ( -4% - 14%) 0.153
MedIntervalsOrdered 13.25 (4.4%) 14.03 (5.5%) 5.9% ( -3% - 16%) 0.060
PKLookup 133.54 (12.5%) 144.12 (5.0%) 7.9% ( -8% - 29%) 0.188
BrowseDayOfYearSSDVFacets 2.66 (14.4%) 2.87 (19.8%) 8.1% ( -22% - 49%) 0.460
BrowseMonthSSDVFacets 2.97 (11.2%) 3.26 (17.0%) 9.6% ( -16% - 42%) 0.292
LowSloppyPhrase 14.93 (5.2%) 16.68 (3.9%) 11.8% ( 2% - 21%) 0.000
AndHighMed 157.26 (8.8%) 183.92 (4.6%) 17.0% ( 3% - 33%) 0.000
Enable intra segment on candidate, baseline using main disabled intra segment (the following run is with this PR optimization)
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Prefix3 55.06 (0.0%) 12.45 (0.0%) -77.4% ( -77% - -77%) 1.000
Wildcard 80.75 (0.0%) 20.50 (0.0%) -74.6% ( -74% - -74%) 1.000
OrHighNotHigh 177.12 (0.0%) 51.88 (0.0%) -70.7% ( -70% - -70%) 1.000
TermDTSort 83.40 (0.0%) 28.37 (0.0%) -66.0% ( -65% - -65%) 1.000
HighTermDayOfYearSort 79.30 (0.0%) 27.37 (0.0%) -65.5% ( -65% - -65%) 1.000
HighTermMonthSort 518.20 (0.0%) 204.98 (0.0%) -60.4% ( -60% - -60%) 1.000
HighTermTitleSort 21.02 (0.0%) 8.51 (0.0%) -59.5% ( -59% - -59%) 1.000
OrNotHighHigh 85.37 (0.0%) 35.75 (0.0%) -58.1% ( -58% - -58%) 1.000
HighTerm 369.78 (0.0%) 182.48 (0.0%) -50.7% ( -50% - -50%) 1.000
OrHighNotMed 159.12 (0.0%) 86.82 (0.0%) -45.4% ( -45% - -45%) 1.000
MedTerm 417.55 (0.0%) 235.22 (0.0%) -43.7% ( -43% - -43%) 1.000
IntNRQ 50.11 (0.0%) 29.95 (0.0%) -40.2% ( -40% - -40%) 1.000
LowTerm 876.98 (0.0%) 531.99 (0.0%) -39.3% ( -39% - -39%) 1.000
IntSet 217.87 (0.0%) 134.06 (0.0%) -38.5% ( -38% - -38%) 1.000
OrNotHighMed 109.80 (0.0%) 75.63 (0.0%) -31.1% ( -31% - -31%) 1.000
MedPhrase 74.25 (0.0%) 53.28 (0.0%) -28.2% ( -28% - -28%) 1.000
OrHighNotLow 423.90 (0.0%) 314.32 (0.0%) -25.8% ( -25% - -25%) 1.000
Fuzzy2 46.24 (0.0%) 36.17 (0.0%) -21.8% ( -21% - -21%) 1.000
AndHighMed 161.59 (0.0%) 132.41 (0.0%) -18.1% ( -18% - -18%) 1.000
MedTermDayTaxoFacets 14.40 (0.0%) 12.07 (0.0%) -16.2% ( -16% - -16%) 1.000
OrHighLow 392.75 (0.0%) 330.70 (0.0%) -15.8% ( -15% - -15%) 1.000
OrNotHighLow 560.04 (0.0%) 471.98 (0.0%) -15.7% ( -15% - -15%) 1.000
LowPhrase 55.39 (0.0%) 47.97 (0.0%) -13.4% ( -13% - -13%) 1.000
BrowseRandomLabelSSDVFacets 2.36 (0.0%) 2.07 (0.0%) -12.1% ( -12% - -12%) 1.000
BrowseDayOfYearTaxoFacets 2.33 (0.0%) 2.06 (0.0%) -11.7% ( -11% - -11%) 1.000
OrHighMed 192.51 (0.0%) 170.16 (0.0%) -11.6% ( -11% - -11%) 1.000
AndHighHighDayTaxoFacets 8.70 (0.0%) 7.71 (0.0%) -11.4% ( -11% - -11%) 1.000
MedSpanNear 137.19 (0.0%) 122.43 (0.0%) -10.8% ( -10% - -10%) 1.000
Fuzzy1 35.18 (0.0%) 31.72 (0.0%) -9.9% ( -9% - -9%) 1.000
AndHighMedDayTaxoFacets 19.99 (0.0%) 18.07 (0.0%) -9.6% ( -9% - -9%) 1.000
AndHighLow 475.14 (0.0%) 444.08 (0.0%) -6.5% ( -6% - -6%) 1.000
OrHighMedDayTaxoFacets 2.36 (0.0%) 2.25 (0.0%) -4.6% ( -4% - -4%) 1.000
HighSloppyPhrase 5.86 (0.0%) 5.65 (0.0%) -3.5% ( -3% - -3%) 1.000
MedSloppyPhrase 51.54 (0.0%) 49.88 (0.0%) -3.2% ( -3% - -3%) 1.000
BrowseDateSSDVFacets 0.49 (0.0%) 0.48 (0.0%) -2.7% ( -2% - -2%) 1.000
AndHighHigh 69.88 (0.0%) 68.22 (0.0%) -2.4% ( -2% - -2%) 1.000
OrHighHigh 46.76 (0.0%) 45.70 (0.0%) -2.3% ( -2% - -2%) 1.000
PKLookup 155.07 (0.0%) 151.70 (0.0%) -2.2% ( -2% - -2%) 1.000
MedIntervalsOrdered 9.78 (0.0%) 9.71 (0.0%) -0.7% ( 0% - 0%) 1.000
LowSloppyPhrase 12.42 (0.0%) 12.58 (0.0%) 1.3% ( 1% - 1%) 1.000
LowIntervalsOrdered 9.69 (0.0%) 9.82 (0.0%) 1.3% ( 1% - 1%) 1.000
BrowseMonthTaxoFacets 2.26 (0.0%) 2.30 (0.0%) 1.6% ( 1% - 1%) 1.000
HighIntervalsOrdered 7.66 (0.0%) 7.91 (0.0%) 3.2% ( 3% - 3%) 1.000
HighSpanNear 3.84 (0.0%) 3.98 (0.0%) 3.6% ( 3% - 3%) 1.000
BrowseRandomLabelTaxoFacets 1.72 (0.0%) 1.79 (0.0%) 3.9% ( 3% - 3%) 1.000
range 2895.37 (0.0%) 3030.59 (0.0%) 4.7% ( 4% - 4%) 1.000
BrowseDateTaxoFacets 2.03 (0.0%) 2.18 (0.0%) 7.4% ( 7% - 7%) 1.000
HighTermTitleBDVSort 5.69 (0.0%) 6.19 (0.0%) 8.7% ( 8% - 8%) 1.000
HighPhrase 114.97 (0.0%) 127.61 (0.0%) 11.0% ( 11% - 11%) 1.000
LowSpanNear 8.80 (0.0%) 9.84 (0.0%) 11.9% ( 11% - 11%) 1.000
BrowseDayOfYearSSDVFacets 3.07 (0.0%) 3.56 (0.0%) 16.2% ( 16% - 16%) 1.000
BrowseMonthSSDVFacets 2.85 (0.0%) 3.37 (0.0%) 18.4% ( 18% - 18%) 1.000
Respell 17.39 (0.0%) 24.42 (0.0%) 40.4% ( 40% - 40%) 1.000
Enable intra segment on baseline and candidate and using this PR optimization
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
PKLookup 153.07 (0.2%) 111.11 (4.4%) -27.4% ( -31% - -22%) 0.000
HighTermTitleBDVSort 8.90 (3.4%) 8.30 (3.7%) -6.7% ( -13% - 0%) 0.056
AndHighMedDayTaxoFacets 65.36 (4.8%) 61.21 (0.8%) -6.3% ( -11% - 0%) 0.066
HighIntervalsOrdered 4.95 (4.7%) 4.64 (0.3%) -6.2% ( -10% - -1%) 0.065
Wildcard 149.60 (2.6%) 140.62 (1.4%) -6.0% ( -9% - -2%) 0.004
MedIntervalsOrdered 27.11 (10.0%) 25.56 (0.2%) -5.7% ( -14% - 4%) 0.417
BrowseRandomLabelTaxoFacets 1.76 (7.1%) 1.67 (1.7%) -5.2% ( -13% - 3%) 0.310
OrHighLow 177.49 (3.9%) 168.38 (1.7%) -5.1% ( -10% - 0%) 0.089
OrHighNotMed 125.49 (4.9%) 119.42 (4.2%) -4.8% ( -13% - 4%) 0.292
LowSpanNear 37.85 (2.4%) 36.31 (1.7%) -4.1% ( -7% - 0%) 0.047
MedTermDayTaxoFacets 12.65 (3.1%) 12.23 (8.0%) -3.3% ( -13% - 8%) 0.584
BrowseRandomLabelSSDVFacets 2.18 (26.1%) 2.12 (23.9%) -3.0% ( -42% - 63%) 0.905
OrNotHighLow 575.10 (1.6%) 559.18 (1.6%) -2.8% ( -5% - 0%) 0.079
LowIntervalsOrdered 17.79 (5.3%) 17.33 (0.0%) -2.6% ( -7% - 2%) 0.493
HighSpanNear 10.59 (1.2%) 10.34 (0.6%) -2.3% ( -4% - 0%) 0.016
LowSloppyPhrase 7.31 (0.4%) 7.16 (1.5%) -2.0% ( -3% - 0%) 0.062
MedPhrase 26.16 (2.7%) 25.65 (1.8%) -2.0% ( -6% - 2%) 0.384
HighTermDayOfYearSort 24.80 (0.7%) 24.38 (1.1%) -1.7% ( -3% - 0%) 0.059
MedTerm 385.18 (0.8%) 379.60 (2.2%) -1.4% ( -4% - 1%) 0.392
HighTerm 246.26 (0.4%) 242.70 (0.6%) -1.4% ( -2% - 0%) 0.003
OrNotHighMed 57.72 (5.0%) 56.91 (4.4%) -1.4% ( -10% - 8%) 0.767
OrHighNotLow 179.56 (8.6%) 177.42 (1.5%) -1.2% ( -10% - 9%) 0.847
MedSpanNear 18.34 (3.3%) 18.17 (1.2%) -0.9% ( -5% - 3%) 0.712
OrHighMed 159.74 (1.0%) 158.67 (0.0%) -0.7% ( -1% - 0%) 0.319
OrNotHighHigh 103.55 (3.7%) 102.95 (5.3%) -0.6% ( -9% - 8%) 0.899
AndHighHighDayTaxoFacets 3.47 (1.7%) 3.46 (0.2%) -0.3% ( -2% - 1%) 0.794
MedSloppyPhrase 9.25 (0.4%) 9.23 (3.6%) -0.2% ( -4% - 3%) 0.926
HighTermTitleSort 10.68 (6.1%) 10.68 (1.2%) -0.1% ( -6% - 7%) 0.988
IntSet 143.67 (1.7%) 143.65 (2.4%) -0.0% ( -4% - 4%) 0.996
AndHighMed 191.95 (2.3%) 192.16 (1.5%) 0.1% ( -3% - 4%) 0.955
AndHighLow 534.50 (3.1%) 535.67 (2.7%) 0.2% ( -5% - 6%) 0.940
BrowseDayOfYearSSDVFacets 2.88 (12.0%) 2.89 (25.2%) 0.5% ( -32% - 42%) 0.981
HighTermMonthSort 202.36 (2.9%) 203.85 (14.0%) 0.7% ( -15% - 18%) 0.942
OrHighHigh 58.36 (0.7%) 58.86 (0.6%) 0.8% ( 0% - 2%) 0.196
HighSloppyPhrase 20.06 (0.5%) 20.23 (0.6%) 0.8% ( 0% - 1%) 0.131
OrHighNotHigh 67.11 (68.0%) 67.78 (57.7%) 1.0% ( -74% - 395%) 0.987
AndHighHigh 83.60 (0.8%) 84.53 (1.5%) 1.1% ( -1% - 3%) 0.354
TermDTSort 26.40 (2.0%) 26.75 (1.5%) 1.3% ( -2% - 4%) 0.449
LowPhrase 134.22 (7.9%) 136.19 (0.4%) 1.5% ( -6% - 10%) 0.794
Fuzzy2 29.68 (4.5%) 30.19 (2.5%) 1.7% ( -5% - 9%) 0.634
Prefix3 54.66 (2.4%) 56.57 (7.9%) 3.5% ( -6% - 14%) 0.548
LowTerm 563.86 (1.0%) 584.85 (3.7%) 3.7% ( 0% - 8%) 0.171
BrowseMonthTaxoFacets 2.06 (0.1%) 2.15 (19.9%) 4.4% ( -15% - 24%) 0.757
BrowseMonthSSDVFacets 2.86 (17.6%) 3.02 (5.9%) 5.5% ( -15% - 35%) 0.673
OrHighMedDayTaxoFacets 2.94 (0.7%) 3.11 (10.8%) 5.7% ( -5% - 17%) 0.457
BrowseDayOfYearTaxoFacets 2.12 (1.4%) 2.26 (36.9%) 6.4% ( -31% - 45%) 0.807
Respell 26.71 (3.1%) 28.55 (1.4%) 6.9% ( 2% - 11%) 0.004
HighPhrase 3.58 (0.3%) 3.83 (12.7%) 7.1% ( -5% - 20%) 0.434
BrowseDateTaxoFacets 2.23 (9.4%) 2.43 (21.2%) 9.3% ( -19% - 43%) 0.572
range 2598.11 (7.1%) 2860.33 (5.9%) 10.1% ( -2% - 24%) 0.121
Fuzzy1 37.34 (10.1%) 43.48 (3.4%) 16.5% ( 2% - 33%) 0.029
BrowseDateSSDVFacets 0.50 (7.1%) 0.65 (3.1%) 30.5% ( 18% - 43%) 0.000
IntNRQ 11.37 (1.0%) 27.37 (9.7%) 140.7% ( 128% - 152%) 0.000
what are the benefits of using this against current main?
Here are some important results https://github.com/apache/lucene/pull/15446#issuecomment-3577215055. I can clearly see this PR change helped to reduce the regression with PointRangeQuery when intra segment search is enabled, but still disabling intra segment search showed faster results.
I think you are focusing in the wrong things. Yes, this change makes intra segment concurrency to suck less, but it still sucks and it is still unusable. It is still 40% slower that the concurrent segment search! We should never ever block search threads.
IMO we should focus in how we do to search the data in a segment concurrently instead.
IMO we should focus in how we do to search the data in a segment concurrently instead.
With the current BKD setup for PointRangeQuery any thoughts or suggestion on this? FYI I did the same experiment for PointInSetQuery showed the same results where it made intra segment concurrency less painful.
Since the benchmark results are part of multiple comments (https://github.com/apache/lucene/pull/15446#issuecomment-3577215055, https://github.com/apache/lucene/pull/15446#issuecomment-3576074285, https://github.com/apache/lucene/pull/15446#issue-3657117120), following is overall summary for IntNRQ.
IntNRQ Benchmark Results
| Scenario | Baseline QPS | Baseline StdDev | Modified QPS | Modified StdDev | % Diff | p-value |
|---|---|---|---|---|---|---|
1. Intra-segment disabled on both (current main) |
69.12 | 5.0% | 69.07 | 3.3% | -0.1% ( -7% - 8%) | 0.985 |
| 2. Intra-segment: Candidate enabled, baseline disabled (without PR optimization) | 127.75 | 0.2% | 30.92 | 1.1% | -75.8% (-76% - -74%) | 0.000 |
| 3. Intra-segment: Candidate enabled, baseline disabled (with PR optimization) | 50.11 | 0.0% | 29.95 | 0.0% | -40.2% (-40% - -40%) | 1.000 |
| 4. Intra-segment enabled on both (with PR optimization) | 12.30 | 3.8% | 30.18 | 8.2% | +145.3% (128% - 163%) | 0.000 |
With the current BKD setup for PointRangeQuery any thoughts or suggestion on this?
In my opinion we cannot achieve it with the current segment layout. We need to evolve lucene segments / data structures so they can be searched concurrently.
In my opinion we cannot achieve it with the current segment layout. We need to evolve lucene segments / data structures so they can be searched concurrently.
I will try to once again see the bottleneck in my current PR to further improve PointRangeQuery with intra segment, in this case should we continue to iterate and merge this PR as its definitely better than current state ?
In my opinion we should not merge this PR, sorry. It makes no sense to have all this complexity that provides no benefit. If someone else think otherwise I am not going to block it but I don't like this approach.
It makes no sense to have all those complexity that provides no benefit.
Thanks for your overall feedback and I'm open for discussion and thoughts.
I don’t fully agree that it provides no benefit as existing implementation for PointRangeQuery currently on main with intra segment has regressions https://github.com/apache/lucene/pull/13542#issuecomment-2332114836. Looking at overall intra segment concept is genuinely useful. I’d like to build on this and apply similar gains to PointRangeQuery as well.
The effort isn't lost - at least we know the benchmarks. I also am not particularly fond of this concurrent cache and blocking approach.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!