lucene
lucene copied to clipboard
Reduce duplication in taxonomy facets; always do counts
Note
This is a large change, refactoring most of the taxonomy facets code and changing internal behavior, without changing the API. There are specific API changes this sets us up to do later, e.g. retrieving counts from aggregation facets.
What does this PR do well?
- Moves most of the responsibility from
TaxonomyFacetsimplementations toTaxonomyFacetsitself. This reduces code duplication and enables future development. Addresses genericity issue mentioned in #12553. - As a consequence, it introduces sparse values to
FloatTaxonomyFacets, which previously used dense values always. This issue is part of #12576. - It computes counts for all taxonomy facets always, which enables us to add an API to retrieve counts for association facets in the future. Addresses #11282.
- As a consequence of having counts, we can check whether we encountered a label while faceting (
count > 0), while previously we relied on the aggregation value to be positive. Closes #12585. - It introduces the idea of doing multiple aggregations in one go, with association facets doing the aggregation they were already doing, plus a count. We can extend to an arbitrary number of aggregations, as suggested in #12546.
- It doesn't change the API. The only change in behavior users should notice is the fix for non-positive aggregation values, which were previously discarded.
- It adds tests which were missing for sparse/dense values and non-positive aggregations.
What's not ideal about this approach?
- ~~We could see some performance decreases. The more critical part of the work, aggregating, should be unaffected. There are a few extra method calls / dispatches / branches. Ranking and collecting results might be impacted because we are boxing / unboxing results to / from
Numberto avoid the primitive types.~~ - ~~The way the
TopOrdAndNumberQueues work is a bit awkward and inefficient. It required small changes to classes outside the scope of this change. Maybe we can come up with something better.~~
What is next?
- I'd like to know if the approach makes sense to others.
- We can try running some benchmarks to see if there are any performance changes.
- ~~Is it important to preserve a default aggregation value of the right type in the results (i.e.
-1for int aggregations,-1ffor float aggregations)? If not, we can make a small simplification to always return-1.~~
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
3. Is it important to preserve a default aggregation value of the right type in the results (i.e.
-1for int aggregations,-1ffor float aggregations)? If not, we can make a small simplification to always return-1.
Maybe defer this to a separate issue? I can see callers expecting a consistent type, though, if you cast (float) Number where Number is an int, the cast would be fine.
I found a fun HeisenBug in one of the tests. When we iterate cursors from IntFloatHashMap, the order is not deterministic. Float summation is not commutative, so the result we get by aggregating the floats in the map can be different depending on the order in which we perform the iteration. For a particular seed, running the test was producing an ordering that was not favorable, while running the debugger produced an ordering that was. The test is fixed in the latest commit and I've opened an issue to do Kahan summation over the floats instead, to reduce the error we're seeing.
For those who want to follow along, here are the exact numbers we are adding in the test in two orderings which produce different results:
class FloatSunIsNotCommutative {
public static void main(String[] args) {
float x = 177182.61f;
float y = 238089.27f;
float z = 255214.66f;
float acc;
acc = 0;
acc += x;
acc += y;
acc += z;
System.out.println(acc);
acc = 0;
acc += z;
acc += y;
acc += x;
System.out.println(acc);
}
}
I've also run the benchmarks (python3 src/python/localrun.py -source wikimediumall). There is measurable regression in the BrowseRandomLabelTaxoFacets task, but not in other taxonomy tasks. The benchmarker also reports improvements in PKLookup, Wildcard, Respell, Fuzzy2, Fuzzy1.
The regression in the taxo task is explained in the profiler. Boxing is not cheap:
11.24% 10402M java.lang.Integer#valueOf()
@mikecan (thank you for the review!) - how should I interpret the other tasks which show a significant change? Are they just noisy?
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseRandomLabelTaxoFacets 3.75 (1.8%) 3.53 (1.6%) -6.0% ( -9% - -2%) 0.000
OrHighMedDayTaxoFacets 1.35 (7.4%) 1.31 (9.2%) -2.7% ( -17% - 15%) 0.308
IntNRQ 21.64 (7.0%) 21.35 (7.4%) -1.3% ( -14% - 14%) 0.561
AndHighLow 366.49 (11.2%) 362.21 (10.3%) -1.2% ( -20% - 22%) 0.731
OrHighNotLow 271.40 (5.3%) 269.03 (4.5%) -0.9% ( -10% - 9%) 0.573
LowTerm 604.77 (5.9%) 599.96 (4.8%) -0.8% ( -10% - 10%) 0.640
TermDTSort 140.65 (2.3%) 139.58 (1.4%) -0.8% ( -4% - 3%) 0.210
LowSpanNear 5.00 (2.8%) 4.96 (4.1%) -0.7% ( -7% - 6%) 0.522
HighSpanNear 4.77 (3.0%) 4.74 (3.6%) -0.7% ( -7% - 6%) 0.522
MedSpanNear 11.24 (2.1%) 11.18 (2.5%) -0.6% ( -5% - 4%) 0.432
MedPhrase 242.61 (2.2%) 241.23 (2.0%) -0.6% ( -4% - 3%) 0.386
HighPhrase 83.17 (2.1%) 82.75 (2.9%) -0.5% ( -5% - 4%) 0.538
OrHighNotHigh 160.48 (4.5%) 159.81 (3.5%) -0.4% ( -8% - 7%) 0.744
HighTermDayOfYearSort 215.60 (2.2%) 214.81 (2.0%) -0.4% ( -4% - 3%) 0.576
MedSloppyPhrase 14.07 (2.0%) 14.03 (2.4%) -0.3% ( -4% - 4%) 0.655
LowPhrase 21.15 (1.3%) 21.09 (1.5%) -0.3% ( -3% - 2%) 0.508
AndHighHighDayTaxoFacets 10.49 (1.2%) 10.46 (1.6%) -0.3% ( -3% - 2%) 0.547
HighSloppyPhrase 13.80 (3.0%) 13.77 (3.1%) -0.3% ( -6% - 5%) 0.791
MedTerm 479.88 (5.1%) 478.82 (4.8%) -0.2% ( -9% - 10%) 0.887
OrHighNotMed 329.08 (4.5%) 328.39 (3.5%) -0.2% ( -7% - 8%) 0.870
HighTerm 264.78 (5.3%) 264.27 (5.2%) -0.2% ( -10% - 10%) 0.908
HighTermMonthSort 1930.74 (4.4%) 1928.03 (5.2%) -0.1% ( -9% - 9%) 0.926
OrNotHighMed 217.72 (2.9%) 217.51 (2.2%) -0.1% ( -5% - 5%) 0.905
MedTermDayTaxoFacets 16.72 (2.1%) 16.71 (1.7%) -0.1% ( -3% - 3%) 0.892
BrowseDayOfYearSSDVFacets 4.12 (2.7%) 4.11 (2.9%) -0.1% ( -5% - 5%) 0.931
BrowseDateTaxoFacets 4.68 (5.1%) 4.67 (4.6%) -0.1% ( -9% - 10%) 0.970
OrNotHighHigh 231.09 (4.5%) 230.99 (3.5%) -0.0% ( -7% - 8%) 0.975
AndHighMedDayTaxoFacets 16.88 (1.1%) 16.88 (1.5%) -0.0% ( -2% - 2%) 0.963
BrowseDayOfYearTaxoFacets 4.76 (5.2%) 4.76 (4.6%) 0.0% ( -9% - 10%) 1.000
OrNotHighLow 464.54 (2.6%) 464.56 (2.3%) 0.0% ( -4% - 5%) 0.995
HighIntervalsOrdered 1.81 (4.6%) 1.81 (5.0%) 0.0% ( -9% - 10%) 0.990
HighTermTitleBDVSort 5.39 (4.8%) 5.40 (4.4%) 0.1% ( -8% - 9%) 0.968
BrowseMonthSSDVFacets 4.40 (2.6%) 4.40 (2.6%) 0.1% ( -4% - 5%) 0.873
MedIntervalsOrdered 1.84 (5.5%) 1.84 (5.8%) 0.2% ( -10% - 12%) 0.918
LowIntervalsOrdered 32.12 (5.4%) 32.18 (5.6%) 0.2% ( -10% - 11%) 0.913
OrHighMed 67.77 (3.1%) 67.97 (3.4%) 0.3% ( -5% - 6%) 0.779
BrowseRandomLabelSSDVFacets 2.89 (2.0%) 2.90 (1.4%) 0.3% ( -3% - 3%) 0.569
BrowseMonthTaxoFacets 9.36 (10.9%) 9.40 (10.4%) 0.4% ( -18% - 24%) 0.896
HighTermTitleSort 132.89 (1.9%) 133.56 (3.9%) 0.5% ( -5% - 6%) 0.600
OrHighHigh 20.24 (3.5%) 20.37 (3.9%) 0.6% ( -6% - 8%) 0.608
AndHighMed 81.65 (8.6%) 82.65 (9.8%) 1.2% ( -15% - 21%) 0.676
LowSloppyPhrase 4.92 (5.9%) 5.01 (6.4%) 1.6% ( -10% - 14%) 0.397
BrowseDateSSDVFacets 1.20 (11.5%) 1.22 (9.1%) 2.1% ( -16% - 25%) 0.529
Prefix3 138.46 (4.9%) 141.54 (4.5%) 2.2% ( -6% - 12%) 0.138
OrHighLow 167.60 (7.5%) 171.65 (4.2%) 2.4% ( -8% - 15%) 0.211
PKLookup 169.39 (4.5%) 174.22 (4.5%) 2.9% ( -5% - 12%) 0.043
AndHighHigh 31.23 (9.5%) 32.15 (12.4%) 2.9% ( -17% - 27%) 0.399
Wildcard 66.79 (3.4%) 69.28 (3.6%) 3.7% ( -3% - 11%) 0.001
Respell 48.03 (2.0%) 50.35 (2.3%) 4.8% ( 0% - 9%) 0.000
Fuzzy2 68.13 (1.3%) 71.67 (1.4%) 5.2% ( 2% - 7%) 0.000
Fuzzy1 74.70 (1.5%) 79.47 (1.8%) 6.4% ( 3% - 9%) 0.000
I found a fun HeisenBug in one of the tests.
Oh the joys of floating point math.
For those who want to follow along, here are the exact numbers we are adding in the test in two orderings which produce different results:
Thank you for diving deep here and making such a simple reproduction.
how should I interpret the other tasks which show a significant change? Are they just noisy?
Good question -- it makes no sense that e.g. Respell/Fuzzy1/2 got faster with this change, though the benchy seems to think it is significant (p=0.000). I'm not sure what to make of it!
The regression in the taxo task is explained in the profiler. Boxing is not cheap:
11.24% 10402M java.lang.Integer#valueOf()
Hmm this is sort of spooky -- should we aim to keep the specialization somehow (avoid the boxing)? Is there a middle ground where we can avoid the boxing but still remove much of / some of this duplicated code? Java is annoying sometimes :)
What I've done is I've only taken advantage of the boxing for genericity when collecting results getTop... and not use it while performing the aggregations themselves. Most of the taxonomy tasks are not showing a significant performance change. I wonder if the one that has slowed down spends more time collecting the aggregation values than calculating them.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
Thank you all for reviewing! I confirmed that the performance impact was from result collection, not from the aggregations themselves, and I've managed to claw back the performance hit. Most of the improvement comes from the changes to getTopChildrenForPath, which no longer usese intermediary Numbers. I've also integrated the performance-related suggestions from @epotyom (thank you for those!). I'll address the rest of the comments too, just wanted to get this out while it's fresh to see if you all have more feedback on the performance front.
python3 src/python/localrun.py -source wikimediumall
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseDateSSDVFacets 1.24 (6.6%) 1.21 (9.6%) -2.5% ( -17% - 14%) 0.334
BrowseRandomLabelTaxoFacets 3.76 (3.7%) 3.69 (3.5%) -1.8% ( -8% - 5%) 0.120
MedPhrase 11.46 (2.8%) 11.30 (2.6%) -1.3% ( -6% - 4%) 0.112
HighTermMonthSort 2290.51 (4.4%) 2262.12 (4.2%) -1.2% ( -9% - 7%) 0.360
OrHighNotMed 327.20 (3.3%) 323.36 (3.2%) -1.2% ( -7% - 5%) 0.252
OrHighNotLow 318.99 (3.7%) 315.45 (4.2%) -1.1% ( -8% - 7%) 0.377
LowPhrase 4.74 (3.1%) 4.69 (3.0%) -1.0% ( -6% - 5%) 0.310
OrNotHighHigh 244.33 (3.1%) 242.52 (3.0%) -0.7% ( -6% - 5%) 0.443
OrHighNotHigh 227.54 (2.9%) 225.86 (3.2%) -0.7% ( -6% - 5%) 0.438
OrNotHighMed 333.78 (2.6%) 331.35 (2.8%) -0.7% ( -5% - 4%) 0.391
HighPhrase 70.04 (3.2%) 69.53 (3.3%) -0.7% ( -6% - 5%) 0.478
AndHighHigh 23.27 (7.9%) 23.11 (7.1%) -0.7% ( -14% - 15%) 0.777
Wildcard 51.02 (4.3%) 50.71 (4.2%) -0.6% ( -8% - 8%) 0.652
MedSpanNear 29.20 (3.0%) 29.05 (2.5%) -0.5% ( -5% - 5%) 0.561
HighTerm 475.59 (4.1%) 473.22 (4.7%) -0.5% ( -8% - 8%) 0.721
PKLookup 176.36 (3.0%) 175.50 (2.7%) -0.5% ( -6% - 5%) 0.589
HighSpanNear 10.52 (2.7%) 10.47 (2.2%) -0.4% ( -5% - 4%) 0.612
MedTerm 470.14 (4.4%) 468.33 (5.4%) -0.4% ( -9% - 9%) 0.804
BrowseDayOfYearSSDVFacets 4.08 (3.9%) 4.06 (4.2%) -0.4% ( -8% - 8%) 0.775
OrNotHighLow 322.80 (2.9%) 321.71 (2.4%) -0.3% ( -5% - 5%) 0.692
HighIntervalsOrdered 3.60 (4.8%) 3.59 (4.8%) -0.3% ( -9% - 9%) 0.868
AndHighMed 83.14 (3.5%) 82.93 (3.9%) -0.2% ( -7% - 7%) 0.833
BrowseDayOfYearTaxoFacets 4.69 (4.5%) 4.68 (4.4%) -0.2% ( -8% - 9%) 0.902
BrowseDateTaxoFacets 4.61 (4.5%) 4.60 (4.3%) -0.1% ( -8% - 9%) 0.937
Respell 53.50 (2.2%) 53.46 (1.8%) -0.1% ( -3% - 4%) 0.902
AndHighMedDayTaxoFacets 43.57 (1.5%) 43.54 (1.6%) -0.1% ( -3% - 3%) 0.891
Fuzzy1 66.17 (2.4%) 66.20 (2.0%) 0.0% ( -4% - 4%) 0.951
AndHighLow 525.57 (2.6%) 525.90 (4.2%) 0.1% ( -6% - 7%) 0.955
OrHighMed 76.00 (3.2%) 76.05 (3.9%) 0.1% ( -6% - 7%) 0.953
HighTermTitleBDVSort 6.93 (7.3%) 6.94 (6.8%) 0.2% ( -13% - 15%) 0.943
MedIntervalsOrdered 2.77 (3.6%) 2.78 (3.2%) 0.2% ( -6% - 7%) 0.883
Fuzzy2 43.83 (1.9%) 43.90 (1.7%) 0.2% ( -3% - 3%) 0.770
LowSpanNear 6.13 (2.1%) 6.14 (1.9%) 0.2% ( -3% - 4%) 0.785
HighSloppyPhrase 5.52 (3.4%) 5.53 (3.7%) 0.2% ( -6% - 7%) 0.851
BrowseMonthSSDVFacets 4.34 (5.1%) 4.35 (4.7%) 0.2% ( -9% - 10%) 0.891
Prefix3 68.56 (4.6%) 68.70 (6.0%) 0.2% ( -9% - 11%) 0.899
LowIntervalsOrdered 18.33 (2.8%) 18.38 (2.5%) 0.3% ( -4% - 5%) 0.737
LowSloppyPhrase 20.67 (2.2%) 20.73 (1.9%) 0.3% ( -3% - 4%) 0.627
AndHighHighDayTaxoFacets 7.57 (2.3%) 7.59 (2.5%) 0.3% ( -4% - 5%) 0.669
HighTermDayOfYearSort 206.91 (2.9%) 207.68 (2.6%) 0.4% ( -5% - 6%) 0.670
HighTermTitleSort 140.79 (1.6%) 141.32 (2.0%) 0.4% ( -3% - 3%) 0.508
LowTerm 438.67 (7.1%) 441.44 (7.9%) 0.6% ( -13% - 16%) 0.790
MedSloppyPhrase 21.78 (3.1%) 21.95 (3.4%) 0.8% ( -5% - 7%) 0.454
MedTermDayTaxoFacets 21.51 (2.2%) 21.71 (1.6%) 0.9% ( -2% - 4%) 0.122
TermDTSort 118.13 (3.0%) 119.30 (3.4%) 1.0% ( -5% - 7%) 0.329
BrowseMonthTaxoFacets 9.58 (8.6%) 9.68 (8.8%) 1.1% ( -14% - 20%) 0.691
BrowseRandomLabelSSDVFacets 2.88 (2.3%) 2.91 (1.8%) 1.1% ( -2% - 5%) 0.093
OrHighHigh 33.81 (7.6%) 34.24 (8.4%) 1.3% ( -13% - 18%) 0.618
OrHighLow 319.44 (6.2%) 323.88 (3.9%) 1.4% ( -8% - 12%) 0.393
IntNRQ 27.52 (5.2%) 27.96 (5.9%) 1.6% ( -8% - 13%) 0.360
OrHighMedDayTaxoFacets 2.83 (3.3%) 2.88 (5.2%) 1.6% ( -6% - 10%) 0.243
@gsmiller - I know you may not have time to review, but I want to at least notify you, since this is a big change and you've been very invovled in this area of the code.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
Hi reviewers! This PR has become stale. Could anyone have a look at it? It has several nice improvements for taxonomy facets, with no API changes, and it sets us up to launch new features in a future release: multiple aggregations in one go and retrieving counts with aggregation facets.
Thank you for reviewing @mikemccand! I had to rebase after #12966. I'll push tomorrow maybe if there are no objections.
I did another benchmark run after the rebase just to make sure I haven't broken anything when integrating the split taxo arrays change. I see no significant changes.
python3 src/python/localrun.py -source wikimediumall
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthTaxoFacets 8.68 (8.6%) 8.41 (8.6%) -3.1% ( -18% - 15%) 0.257
OrHighHigh 24.38 (4.8%) 24.09 (4.9%) -1.2% ( -10% - 8%) 0.424
AndHighHigh 26.10 (4.6%) 25.80 (2.2%) -1.1% ( -7% - 5%) 0.315
HighTerm 254.91 (7.0%) 252.20 (5.9%) -1.1% ( -13% - 12%) 0.604
HighTermDayOfYearSort 307.54 (2.0%) 305.21 (2.1%) -0.8% ( -4% - 3%) 0.249
OrNotHighLow 506.28 (2.2%) 502.52 (2.6%) -0.7% ( -5% - 4%) 0.327
LowTerm 497.25 (6.3%) 493.71 (5.7%) -0.7% ( -11% - 12%) 0.709
OrHighMed 102.21 (3.8%) 101.52 (4.2%) -0.7% ( -8% - 7%) 0.589
MedTerm 505.87 (6.8%) 502.44 (5.9%) -0.7% ( -12% - 12%) 0.737
TermDTSort 130.10 (2.4%) 129.27 (2.0%) -0.6% ( -4% - 3%) 0.359
OrHighNotLow 420.65 (3.9%) 418.28 (3.8%) -0.6% ( -7% - 7%) 0.644
AndHighMed 89.03 (2.4%) 88.53 (1.4%) -0.6% ( -4% - 3%) 0.365
BrowseRandomLabelTaxoFacets 3.72 (1.8%) 3.70 (1.4%) -0.5% ( -3% - 2%) 0.303
HighTermTitleBDVSort 10.39 (4.7%) 10.34 (4.4%) -0.4% ( -9% - 9%) 0.775
Prefix3 131.17 (2.0%) 130.64 (3.3%) -0.4% ( -5% - 5%) 0.645
HighTermTitleSort 155.59 (2.2%) 155.00 (2.2%) -0.4% ( -4% - 4%) 0.590
OrHighMedDayTaxoFacets 4.50 (5.4%) 4.49 (5.5%) -0.4% ( -10% - 11%) 0.825
AndHighMedDayTaxoFacets 17.89 (1.9%) 17.85 (1.5%) -0.3% ( -3% - 3%) 0.636
BrowseDateTaxoFacets 4.57 (1.8%) 4.56 (1.5%) -0.3% ( -3% - 3%) 0.639
AndHighLow 677.34 (2.6%) 675.67 (1.8%) -0.2% ( -4% - 4%) 0.729
OrHighNotMed 349.74 (3.7%) 348.93 (2.8%) -0.2% ( -6% - 6%) 0.823
OrHighNotHigh 321.44 (3.1%) 320.71 (3.0%) -0.2% ( -6% - 6%) 0.815
OrNotHighHigh 229.84 (2.9%) 229.33 (2.7%) -0.2% ( -5% - 5%) 0.805
BrowseDayOfYearTaxoFacets 4.63 (1.7%) 4.62 (1.5%) -0.2% ( -3% - 3%) 0.675
OrHighLow 377.28 (1.3%) 376.48 (1.3%) -0.2% ( -2% - 2%) 0.601
MedPhrase 447.55 (2.2%) 446.61 (2.6%) -0.2% ( -4% - 4%) 0.781
AndHighHighDayTaxoFacets 2.48 (3.9%) 2.47 (2.7%) -0.2% ( -6% - 6%) 0.882
HighSpanNear 2.84 (2.2%) 2.84 (2.0%) -0.1% ( -4% - 4%) 0.835
Wildcard 294.36 (2.4%) 293.99 (2.8%) -0.1% ( -5% - 5%) 0.879
Fuzzy2 61.91 (1.2%) 61.85 (1.3%) -0.1% ( -2% - 2%) 0.814
LowSpanNear 36.58 (1.9%) 36.56 (1.8%) -0.1% ( -3% - 3%) 0.923
LowPhrase 41.87 (1.2%) 41.85 (1.6%) -0.0% ( -2% - 2%) 0.925
MedTermDayTaxoFacets 23.10 (2.5%) 23.10 (2.5%) 0.0% ( -4% - 5%) 0.991
Fuzzy1 88.20 (0.9%) 88.23 (1.3%) 0.0% ( -2% - 2%) 0.935
Respell 46.76 (1.8%) 46.77 (1.8%) 0.0% ( -3% - 3%) 0.950
OrNotHighMed 325.18 (2.3%) 325.71 (2.0%) 0.2% ( -4% - 4%) 0.811
MedSpanNear 6.23 (4.0%) 6.24 (3.8%) 0.2% ( -7% - 8%) 0.846
HighPhrase 20.42 (1.9%) 20.47 (2.8%) 0.3% ( -4% - 5%) 0.737
HighIntervalsOrdered 9.90 (4.4%) 9.94 (2.9%) 0.4% ( -6% - 8%) 0.763
LowIntervalsOrdered 14.11 (4.2%) 14.17 (2.4%) 0.4% ( -5% - 7%) 0.698
BrowseMonthSSDVFacets 4.15 (1.5%) 4.17 (2.1%) 0.4% ( -3% - 4%) 0.438
PKLookup 190.68 (1.8%) 191.62 (1.7%) 0.5% ( -2% - 4%) 0.381
MedIntervalsOrdered 4.54 (4.3%) 4.57 (2.9%) 0.5% ( -6% - 8%) 0.649
HighSloppyPhrase 14.51 (2.0%) 14.62 (2.1%) 0.7% ( -3% - 4%) 0.243
BrowseRandomLabelSSDVFacets 2.83 (6.1%) 2.85 (5.7%) 0.8% ( -10% - 13%) 0.674
LowSloppyPhrase 13.09 (2.1%) 13.20 (2.4%) 0.8% ( -3% - 5%) 0.231
HighTermMonthSort 2155.96 (3.5%) 2177.02 (3.6%) 1.0% ( -5% - 8%) 0.382
BrowseDayOfYearSSDVFacets 4.00 (2.2%) 4.05 (2.1%) 1.2% ( -3% - 5%) 0.073
MedSloppyPhrase 12.84 (4.2%) 13.04 (4.7%) 1.6% ( -7% - 10%) 0.260
BrowseDateSSDVFacets 1.17 (9.3%) 1.19 (7.0%) 1.9% ( -13% - 20%) 0.458
IntNRQ 21.04 (26.3%) 22.13 (25.7%) 5.2% ( -37% - 77%) 0.531
I'm finding this difficult to port to 9x because of the way the classes have diverged and I'm not sure it's worthwhile, since a lot of the benefits here are for future development and to support API changes that would go in Lucene 10. I'll move the CHANGES entries and milestones to Lucene 10 unless anyone thinks it's worth backporting.
Now that #12408 was backported in https://github.com/apache/lucene/pull/13300 can we now backport this to 9.x? Or was it already done in an un-linked PR or so?
Remembering to backport is proving challenging and error-proned (it always has been), not just in all of us consistently agreeing on the criteria for backport (we should always aim to backport unless it breaks non-experimental/internal public APIs?), but also in actually remembering to do it after a PR is merged to main. I wish GH provided some stronger mechanisms for us here ...
I was just working on it today actually and finally got it in shape: #13358. Sorry it took so long!
I was skeptical this would work out at first, but I think we have a successful backport in the end, so the changes will go out with 9.11.