lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Remove HitsThresholdChecker.

Open jpountz opened this issue 1 year ago • 2 comments

Our top-docs collectors have a dependency on HitsThresholdChecker, which is essentially a shared counter that is incremented until it reaches the total hits threshold, when the scorer can start dynamically pruning hits.

A consequence of this removal is that dynamic pruning may start later, as soon as:

  • either the current slice collected totalHitsThreshold hits,
  • or another slice collected totalHitsThreshold hits and the current slice collected enough hits (up to 1,024) to check the shared MaxScoreAccumulator.

So in short, it exchanges a bit more work globally in favor of a bit less contention. A longer-term goal of mine is to stop specializing our CollectorManagers based on whether they are going to be used concurrently or not.

jpountz avatar Oct 22 '24 16:10 jpountz

EDIT: at the time of this comment, only TopScoreDocCollector no longer used HitsThresholdChecker.

wikibigall with a searchConcurrency of 8 suggests that the slowdown is tiny:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      AndHighLow      849.71      (3.3%)      826.18      (2.2%)   -2.8% (  -7% -    2%) 0.002
           HighTermDayOfYearSort      253.62      (3.2%)      246.72      (3.0%)   -2.7% (  -8% -    3%) 0.005
                      TermDTSort      202.82      (3.4%)      198.21      (4.4%)   -2.3% (  -9% -    5%) 0.069
            HighTermTitleBDVSort       49.87      (5.1%)       49.01      (5.5%)   -1.7% ( -11% -    9%) 0.306
                      OrHighRare      245.78      (8.9%)      242.01      (9.2%)   -1.5% ( -17% -   18%) 0.591
             And2Terms2StopWords      184.35      (5.3%)      181.84      (4.5%)   -1.4% ( -10% -    8%) 0.379
                     AndHighHigh      102.63      (6.9%)      101.25      (5.9%)   -1.3% ( -13% -   12%) 0.507
                       CountTerm     8520.99      (3.9%)     8417.05      (5.1%)   -1.2% (  -9% -    8%) 0.396
                        Wildcard      112.90      (5.6%)      111.56      (4.8%)   -1.2% ( -10% -    9%) 0.471
                          Fuzzy1       79.74      (1.7%)       79.07      (1.7%)   -0.8% (  -4% -    2%) 0.114
                       OrHighLow      724.26      (2.3%)      719.70      (2.2%)   -0.6% (  -5% -    3%) 0.377
              Or2Terms2StopWords      192.42      (5.4%)      191.24      (4.0%)   -0.6% (  -9% -    9%) 0.680
                       And3Terms      176.74      (5.3%)      175.76      (4.1%)   -0.6% (  -9% -    9%) 0.712
                 CountOrHighHigh       88.76      (4.9%)       88.34      (4.4%)   -0.5% (  -9% -    9%) 0.744
               HighTermMonthSort     1066.76      (1.6%)     1062.79      (1.9%)   -0.4% (  -3% -    3%) 0.506
                          Fuzzy2       75.06      (1.4%)       74.85      (1.8%)   -0.3% (  -3% -    3%) 0.597
                  CountOrHighMed      137.67      (5.3%)      137.45      (4.4%)   -0.2% (  -9% -   10%) 0.920
                   OrHighNotHigh      196.77      (3.3%)      196.66      (3.5%)   -0.1% (  -6% -    6%) 0.959
               HighTermTitleSort       70.08      (6.6%)       70.09      (5.7%)    0.0% ( -11% -   13%) 0.994
                      OrHighHigh       94.07      (4.8%)       94.12      (5.1%)    0.0% (  -9% -   10%) 0.975
                      AndHighMed      182.18      (4.1%)      182.43      (3.4%)    0.1% (  -7% -    8%) 0.909
                    OrNotHighMed      255.11      (2.8%)      255.50      (3.3%)    0.2% (  -5% -    6%) 0.874
                       OrHighMed      242.11      (2.4%)      242.65      (2.5%)    0.2% (  -4% -    5%) 0.772
                   OrNotHighHigh      235.61      (2.3%)      236.26      (3.4%)    0.3% (  -5% -    6%) 0.766
                        HighTerm      361.55      (2.6%)      362.84      (2.7%)    0.4% (  -4% -    5%) 0.669
                         MedTerm      453.24      (2.8%)      455.07      (2.5%)    0.4% (  -4% -    5%) 0.628
                    OrHighNotMed      317.00      (3.0%)      318.40      (3.8%)    0.4% (  -6% -    7%) 0.680
                        PKLookup      277.48      (2.2%)      278.76      (2.7%)    0.5% (  -4% -    5%) 0.558
                          OrMany       46.17      (2.3%)       46.41      (2.9%)    0.5% (  -4% -    5%) 0.520
                         Prefix3       68.55      (4.0%)       69.01      (4.7%)    0.7% (  -7% -    9%) 0.627
                    OrHighNotLow      336.53      (3.2%)      339.73      (3.9%)    1.0% (  -5% -    8%) 0.395
                    AndStopWords       64.81      (5.3%)       65.46      (5.3%)    1.0% (  -9% -   12%) 0.543
                         LowTerm      640.08      (3.1%)      647.88      (2.6%)    1.2% (  -4% -    7%) 0.176
                CountAndHighHigh       74.37      (5.2%)       75.37      (5.6%)    1.3% (  -8% -   12%) 0.426
                 CountAndHighMed      161.25      (5.1%)      163.59      (5.7%)    1.4% (  -8% -   12%) 0.394
                    OrNotHighLow      865.87      (3.3%)      880.11      (2.7%)    1.6% (  -4% -    7%) 0.081
                        Or3Terms      175.34      (4.3%)      178.28      (4.9%)    1.7% (  -7% -   11%) 0.252
                     OrStopWords       69.26      (6.5%)       70.71      (6.4%)    2.1% ( -10% -   15%) 0.303
                          IntNRQ      166.81      (5.5%)      170.67     (10.5%)    2.3% ( -12% -   19%) 0.381

jpountz avatar Oct 22 '24 16:10 jpountz

Actually, while I was at it, I also removed TopFieldCollector's dependency on HitsThresholdChecker, and then removed HitsThresholdChecker.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             And2Terms2StopWords      189.12      (4.5%)      182.81      (4.2%)   -3.3% ( -11% -    5%) 0.015
                    AndStopWords       67.12      (4.6%)       66.00      (3.8%)   -1.7% (  -9% -    7%) 0.214
                     OrStopWords       70.79      (6.9%)       69.97      (7.2%)   -1.1% ( -14% -   13%) 0.607
                       CountTerm     8165.89      (5.6%)     8090.18      (3.8%)   -0.9% (  -9% -    8%) 0.538
                        Wildcard      133.39      (4.1%)      132.24      (3.3%)   -0.9% (  -7% -    6%) 0.468
              Or2Terms2StopWords      194.66      (5.4%)      193.42      (4.1%)   -0.6% (  -9% -    9%) 0.673
                       OrHighMed      198.12      (4.1%)      197.08      (3.5%)   -0.5% (  -7% -    7%) 0.663
                       And3Terms      180.61      (4.0%)      179.74      (3.9%)   -0.5% (  -8% -    7%) 0.700
                 CountAndHighMed      163.62      (4.5%)      163.28      (3.5%)   -0.2% (  -7% -    8%) 0.872
                          Fuzzy1       78.71      (2.3%)       78.63      (1.8%)   -0.1% (  -4% -    4%) 0.881
                      AndHighMed      160.23      (4.9%)      160.16      (4.4%)   -0.0% (  -8% -    9%) 0.974
                      OrHighHigh      103.71      (4.1%)      103.68      (4.2%)   -0.0% (  -8% -    8%) 0.981
                          OrMany       46.85      (2.4%)       46.85      (2.7%)   -0.0% (  -5% -    5%) 0.996
                  CountOrHighMed      139.92      (4.2%)      139.96      (4.7%)    0.0% (  -8% -    9%) 0.984
                        PKLookup      278.34      (2.4%)      278.47      (2.5%)    0.0% (  -4% -    5%) 0.950
                        Or3Terms      176.21      (3.8%)      176.43      (2.9%)    0.1% (  -6% -    7%) 0.904
                          Fuzzy2       74.40      (2.5%)       74.65      (2.0%)    0.3% (  -4% -    4%) 0.647
                CountAndHighHigh       76.54      (4.8%)       76.82      (4.1%)    0.4% (  -8% -    9%) 0.798
                     AndHighHigh       96.45      (5.3%)       96.82      (4.3%)    0.4% (  -8% -   10%) 0.800
                   OrNotHighHigh      200.33      (3.1%)      201.39      (4.6%)    0.5% (  -6% -    8%) 0.667
                      AndHighLow     1013.59      (2.4%)     1019.06      (2.4%)    0.5% (  -4% -    5%) 0.480
               HighTermTitleSort       72.60      (3.3%)       73.08      (3.5%)    0.7% (  -5% -    7%) 0.540
                         Prefix3      191.90      (5.3%)      193.31      (4.5%)    0.7% (  -8% -   11%) 0.635
                   OrHighNotHigh      198.74      (2.8%)      200.40      (4.1%)    0.8% (  -5% -    7%) 0.452
                          IntNRQ      125.30     (11.9%)      126.54      (9.9%)    1.0% ( -18% -   25%) 0.776
                    OrHighNotMed      281.15      (2.4%)      284.02      (5.0%)    1.0% (  -6% -    8%) 0.407
           HighTermDayOfYearSort      255.46      (3.2%)      258.19      (3.8%)    1.1% (  -5% -    8%) 0.333
                 CountOrHighHigh       90.76      (5.7%)       91.74      (5.1%)    1.1% (  -9% -   12%) 0.529
                    OrNotHighMed      326.08      (3.7%)      329.84      (4.8%)    1.2% (  -7% -   10%) 0.396
                       OrHighLow      786.85      (1.8%)      797.01      (2.7%)    1.3% (  -3% -    5%) 0.076
                    OrHighNotLow      368.40      (2.5%)      373.79      (5.3%)    1.5% (  -6% -    9%) 0.265
            HighTermTitleBDVSort       57.32      (4.3%)       58.16      (4.0%)    1.5% (  -6% -   10%) 0.268
                    OrNotHighLow     1096.17      (3.0%)     1114.86      (3.0%)    1.7% (  -4% -    7%) 0.071
                        HighTerm      338.49      (2.5%)      347.01      (2.5%)    2.5% (  -2% -    7%) 0.001
                         LowTerm      663.36      (3.3%)      682.20      (3.4%)    2.8% (  -3% -    9%) 0.008
                      OrHighRare      246.60      (6.2%)      253.61      (7.7%)    2.8% ( -10% -   17%) 0.198
                         MedTerm      433.32      (3.0%)      445.99      (3.1%)    2.9% (  -3% -    9%) 0.003
                      TermDTSort      196.52      (3.4%)      219.52      (4.9%)   11.7% (   3% -   20%) 0.000
               HighTermMonthSort     1082.56      (2.7%)     1404.25      (3.4%)   29.7% (  22% -   36%) 0.000

Queries sorted by field seem to benefit from the lesser synchronization across search threads.

jpountz avatar Oct 23 '24 13:10 jpountz

Wow, it's been a bigger speedup on nightly benchmarks than on my machine: https://benchmarks.mikemccandless.com/And3Terms.html.

jpountz avatar Oct 29 '24 18:10 jpountz