GH#11922: Allow DisjunctionDISIApproximation to short-circuit
Description
This PR explores allowing DisjunctionDISIApproximation to short-circuit after "finding" a doc, allowing sub-iterators to only be exhaustively advanced when necessary. This could be useful for non-scoring disjunction clauses.
Note: I've only changed the approximation implementation to start. DisjunctionScorer exhaustively advances all sub-iterators in this implementation if it needs to do two-phase match confirmation (which is what we do today). Ideally, we'd only advance as necessary in this step as well.
Have you checked if this actually made things faster? We're saving calls to advance in some cases but also adding conditions to some very tight loops. E.g. we recently got speedups by moving some queries from WAND to MAXSCORE even though WAND is better at saving calls to advance compared to MAXSCORE.
Have you checked if this actually made things faster? We're saving calls to advance in some cases but also adding conditions to some very tight loops.
I'd run some of our internal benchmarks with this change, and it did show a positive impact (reduced cost and latency). I did a quick run with luceneutil benchmark tasks, and it looked pretty flat. These were both kind of hastily run though, and I was guessing that the reason we saw an improvement internally is because we don't rely on scoring, so could take advantage (while I was guessing that the luceneutil tasks do scoring, but I didn't dig in yet).
It's a really good callout about the condition check in the tight loop. That hadn't occurred to me, so thank you for mentioning it. For the common case of requiring scores, this change seems like it would just cause a net increase to cost, so I don't love that.
As a next step, I'll dig into benchmarking a bit more. If it looks like there is a clear positive impact still to the non-scoring case, I'd like to see if there's a way to avoid the added condition-checking cost for the common scoring case (since all the sub-iterators will need to get advanced anyway).
Thanks for the early feedback @jpountz !
@jpountz thanks for the implementation feedback! I've updated the PR, but still plan to do more benchmarking to really understand the benefit, etc. before looking to actually merge this. I'll follow up here once I've been able to do that.
@jpountz I re-ran some internal benchmarking with this change to highlight the speedup in cases where scoring isn't needed (at least some specific use-cases I'm looking at). These use-cases all involve a "disjunction filter," meaning a disjunction of terms that is used as a required clause. So something like (+ (foo:bar foo:baz foo:zed) (...)), where the foo field must take on one of the specified values to be considered a candidate match. To provide a sense of scale, on average, these filters have 40 different terms in them. Since these "filters" don't participate in scoring at all, it's a good candidate for this short-circuiting.
In these benchmarks, I'm observing a 2.3% QPS improvement, and a 3.5% avg. latency reduction (5.9% p50 reduction / 3.5% p99 reduction). So the change appears to be helping this type of situation.
As for whether-or-not this change would actually hurt other common use-cases that require scoring or second-phase checks, I re-ran luceneutil benchmarks (wikimedium10m) task and don't observe any regressions there (results below). It's possible there's a gap in our benchmarks though, and maybe there are some common use-cases not covered?
TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value
MedSloppyPhrase 115.00 (4.9%) 112.91 (5.1%) -1.8% ( -11% - 8%) 0.249
HighTermTitleSort 242.48 (3.2%) 238.66 (4.3%) -1.6% ( -8% - 6%) 0.189
HighSloppyPhrase 36.47 (3.8%) 36.12 (3.9%) -0.9% ( -8% - 7%) 0.439
LowTerm 1766.82 (3.4%) 1752.12 (3.3%) -0.8% ( -7% - 6%) 0.436
HighPhrase 263.74 (3.4%) 261.71 (2.3%) -0.8% ( -6% - 5%) 0.404
OrHighLow 796.71 (2.7%) 790.71 (2.6%) -0.8% ( -5% - 4%) 0.367
BrowseDateSSDVFacets 3.46 (6.4%) 3.44 (7.1%) -0.7% ( -13% - 13%) 0.755
HighTermMonthSort 3070.86 (4.5%) 3051.39 (3.9%) -0.6% ( -8% - 8%) 0.635
Prefix3 111.76 (4.6%) 111.08 (4.2%) -0.6% ( -8% - 8%) 0.658
OrNotHighHigh 1249.27 (3.4%) 1242.30 (3.8%) -0.6% ( -7% - 6%) 0.627
BrowseMonthTaxoFacets 35.43 (1.6%) 35.23 (2.2%) -0.6% ( -4% - 3%) 0.367
LowSloppyPhrase 61.49 (2.4%) 61.18 (2.5%) -0.5% ( -5% - 4%) 0.512
OrHighNotMed 1139.05 (3.9%) 1133.29 (3.6%) -0.5% ( -7% - 7%) 0.671
BrowseRandomLabelTaxoFacets 20.33 (4.5%) 20.23 (5.6%) -0.5% ( -10% - 10%) 0.760
HighTerm 1635.45 (4.2%) 1628.28 (3.9%) -0.4% ( -8% - 8%) 0.735
MedPhrase 46.41 (2.3%) 46.22 (1.6%) -0.4% ( -4% - 3%) 0.529
OrHighMed 193.55 (2.8%) 192.79 (2.9%) -0.4% ( -5% - 5%) 0.663
OrHighNotHigh 865.20 (3.0%) 862.38 (3.9%) -0.3% ( -6% - 6%) 0.766
AndHighLow 1566.83 (2.7%) 1562.47 (2.7%) -0.3% ( -5% - 5%) 0.745
MedTermDayTaxoFacets 48.00 (3.5%) 47.89 (3.6%) -0.2% ( -7% - 7%) 0.836
HighTermDayOfYearSort 812.55 (2.8%) 811.49 (2.5%) -0.1% ( -5% - 5%) 0.878
MedTerm 2390.59 (3.5%) 2387.70 (3.5%) -0.1% ( -6% - 7%) 0.912
BrowseDateTaxoFacets 25.05 (9.2%) 25.03 (9.3%) -0.1% ( -16% - 20%) 0.984
BrowseMonthSSDVFacets 16.00 (18.9%) 15.99 (19.0%) -0.1% ( -31% - 46%) 0.992
LowIntervalsOrdered 276.77 (3.5%) 276.66 (3.6%) -0.0% ( -6% - 7%) 0.972
OrHighHigh 26.41 (4.4%) 26.40 (4.4%) -0.0% ( -8% - 9%) 0.978
MedSpanNear 35.38 (1.0%) 35.38 (1.1%) -0.0% ( -2% - 2%) 0.969
TermDTSort 648.08 (1.0%) 648.35 (1.3%) 0.0% ( -2% - 2%) 0.909
BrowseDayOfYearTaxoFacets 25.12 (9.5%) 25.13 (9.5%) 0.0% ( -17% - 21%) 0.988
OrNotHighLow 1192.74 (3.1%) 1193.69 (2.6%) 0.1% ( -5% - 5%) 0.929
HighIntervalsOrdered 1.79 (3.3%) 1.79 (3.3%) 0.1% ( -6% - 6%) 0.934
Respell 46.93 (1.3%) 46.97 (1.3%) 0.1% ( -2% - 2%) 0.825
LowSpanNear 11.96 (0.8%) 11.97 (0.9%) 0.1% ( -1% - 1%) 0.702
BrowseRandomLabelSSDVFacets 10.14 (9.0%) 10.16 (9.4%) 0.1% ( -16% - 20%) 0.961
LowPhrase 438.44 (1.6%) 439.23 (1.4%) 0.2% ( -2% - 3%) 0.701
AndHighHighDayTaxoFacets 28.52 (1.9%) 28.57 (1.8%) 0.2% ( -3% - 3%) 0.746
MedIntervalsOrdered 64.28 (2.3%) 64.42 (2.3%) 0.2% ( -4% - 4%) 0.777
OrHighNotLow 1854.91 (2.7%) 1858.96 (3.9%) 0.2% ( -6% - 6%) 0.836
HighSpanNear 12.65 (1.4%) 12.69 (1.7%) 0.3% ( -2% - 3%) 0.579
OrHighMedDayTaxoFacets 14.34 (3.0%) 14.38 (4.3%) 0.3% ( -6% - 7%) 0.809
Fuzzy1 81.66 (0.9%) 81.90 (1.1%) 0.3% ( -1% - 2%) 0.357
BrowseDayOfYearSSDVFacets 13.40 (12.7%) 13.44 (12.5%) 0.3% ( -22% - 29%) 0.942
AndHighMedDayTaxoFacets 34.33 (2.1%) 34.45 (1.7%) 0.3% ( -3% - 4%) 0.569
Fuzzy2 76.36 (0.8%) 76.64 (0.9%) 0.4% ( -1% - 2%) 0.187
HighTermTitleBDVSort 12.49 (6.6%) 12.55 (6.6%) 0.4% ( -11% - 14%) 0.846
PKLookup 179.34 (2.2%) 180.48 (2.3%) 0.6% ( -3% - 5%) 0.364
AndHighMed 268.67 (3.8%) 270.48 (3.9%) 0.7% ( -6% - 8%) 0.579
OrNotHighMed 1006.62 (4.1%) 1013.59 (4.3%) 0.7% ( -7% - 9%) 0.604
Wildcard 133.41 (2.7%) 134.34 (2.7%) 0.7% ( -4% - 6%) 0.417
AndHighHigh 74.34 (4.7%) 74.95 (5.2%) 0.8% ( -8% - 11%) 0.608
IntNRQ 82.56 (3.5%) 83.70 (4.9%) 1.4% ( -6% - 10%) 0.306
Given this, I'd prefer to move forward with the change (since it doesn't seem to be doing any harm to common cases covered in luceneutil and is helping in the filtering use-cases I'm benchmarking), but I'd like your opinion. Given how central of a component this is to scoring, I want to make sure I'm not being overly biased towards the use-case I'm focused on at the cost of regressing elsewhere. Thanks again for your inputs!
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!