lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Update TermStates.java to remove deferred lambda in TermStates.get

Open shubhamsrkdev opened this issue 1 month ago • 4 comments

Description

Remove the lambda from TermStates.get(LeafReaderContext). The method now eagerly resolves the term lookup. I am not sure if this change would expressly benefit everyone, feedback here would be greatly appreciated.

Benchmarks

The results looks like noise as the p-value is high for all of them.

               TaskQPS.         baseline   StdDevQPS    candidate     StdDev   Pct-diff             p-value
             MedSloppyPhrase       201.59     (7.2%)      194.49      (8.0%)   -3.5% ( -17% -   12%) 0.141
             LowSloppyPhrase      176.82      (5.6%)      170.78      (7.9%)   -3.4% ( -15% -   10%) 0.114
                  TermDTSort      319.06      (3.6%)      308.68      (7.1%)   -3.3% ( -13% -    7%) 0.068
 BrowseRandomLabelSSDVFacets        3.31     (12.4%)        3.24     (10.6%)   -2.2% ( -22% -   23%) 0.551
       HighTermDayOfYearSort      359.86      (3.3%)      352.33      (5.8%)   -2.1% ( -10% -    7%) 0.159
        HighIntervalsOrdered       66.60      (8.1%)       65.37      (9.9%)   -1.9% ( -18% -   17%) 0.517
         LowIntervalsOrdered       88.21      (6.3%)       86.98      (7.5%)   -1.4% ( -14% -   13%) 0.525
                      IntSet      849.67      (4.2%)      839.73      (5.1%)   -1.2% (  -9% -    8%) 0.425
           HighTermMonthSort     1391.79      (3.4%)     1376.85      (1.8%)   -1.1% (  -6% -    4%) 0.212
                    PKLookup      203.28      (2.6%)      201.17      (2.9%)   -1.0% (  -6% -    4%) 0.237
            HighSloppyPhrase       20.25      (3.3%)       20.08      (3.8%)   -0.8% (  -7% -    6%) 0.459
       BrowseMonthTaxoFacets        2.79      (2.1%)        2.78      (1.3%)   -0.7% (  -3% -    2%) 0.209
         MedIntervalsOrdered      160.80      (5.1%)      159.78      (5.9%)   -0.6% ( -11% -   10%) 0.716
                     LowTerm     1625.14      (3.5%)     1614.93      (2.8%)   -0.6% (  -6% -    5%) 0.532
                 MedSpanNear       15.55      (3.2%)       15.46      (3.6%)   -0.6% (  -7% -    6%) 0.603
                     Respell       60.17      (2.3%)       59.88      (2.2%)   -0.5% (  -4% -    4%) 0.500
                    Wildcard      316.12      (3.4%)      314.60      (3.9%)   -0.5% (  -7% -    7%) 0.678
                   MedPhrase       56.20      (1.3%)       55.94      (1.3%)   -0.5% (  -2% -    2%) 0.264
           HighTermTitleSort      129.14      (3.3%)      128.56      (2.8%)   -0.4% (  -6% -    5%) 0.638
                   OrHighMed      416.92      (2.2%)      415.43      (1.4%)   -0.4% (  -3% -    3%) 0.548
                   LowPhrase       42.86      (1.5%)       42.72      (2.1%)   -0.3% (  -3% -    3%) 0.578
       BrowseMonthSSDVFacets        4.41      (7.0%)        4.40      (7.0%)   -0.2% ( -13% -   14%) 0.929
                HighSpanNear       29.41      (2.4%)       29.37      (2.3%)   -0.1% (  -4% -    4%) 0.858
                   OrHighLow     1054.41      (2.9%)     1053.07      (2.4%)   -0.1% (  -5% -    5%) 0.879
               OrHighNotHigh      530.73      (4.5%)      530.13      (7.0%)   -0.1% ( -11% -   11%) 0.952
                  AndHighMed      634.06      (2.0%)      634.04      (1.9%)   -0.0% (  -3% -    4%) 0.996
                 LowSpanNear      351.51      (1.8%)      351.51      (2.2%)    0.0% (  -3% -    4%) 0.998
                  HighPhrase      157.09      (1.3%)      157.11      (1.6%)    0.0% (  -2% -    2%) 0.982
                  OrHighHigh      298.09      (3.1%)      298.21      (1.8%)    0.0% (  -4% -    5%) 0.958
    AndHighHighDayTaxoFacets       17.20      (1.6%)       17.23      (1.4%)    0.2% (  -2% -    3%) 0.753
     AndHighMedDayTaxoFacets      146.02      (0.9%)      146.31      (0.9%)    0.2% (  -1% -    2%) 0.506
        HighTermTitleBDVSort       47.22      (1.9%)       47.32      (1.9%)    0.2% (  -3% -    4%) 0.723
                      Fuzzy1       78.08      (2.7%)       78.31      (2.3%)    0.3% (  -4% -    5%) 0.710
                      Fuzzy2       64.77      (2.7%)       65.00      (2.2%)    0.4% (  -4% -    5%) 0.648
                     Prefix3      465.74      (3.1%)      467.41      (2.4%)    0.4% (  -5% -    6%) 0.687
        MedTermDayTaxoFacets       31.46      (1.0%)       31.59      (1.1%)    0.4% (  -1% -    2%) 0.216
                OrNotHighMed      490.15      (5.5%)      492.51      (6.0%)    0.5% ( -10% -   12%) 0.791
                OrNotHighLow     1128.09      (2.1%)     1133.61      (3.1%)    0.5% (  -4% -    5%) 0.556
                     MedTerm     1044.63      (4.7%)     1052.38      (2.2%)    0.7% (  -5% -    8%) 0.523
                 AndHighHigh      300.64      (4.1%)      303.14      (1.2%)    0.8% (  -4% -    6%) 0.381
               OrNotHighHigh      429.76      (5.3%)      433.35      (6.4%)    0.8% ( -10% -   13%) 0.653
      OrHighMedDayTaxoFacets       12.51      (2.4%)       12.62      (2.0%)    0.9% (  -3% -    5%) 0.194
                  AndHighLow     1293.91      (4.1%)     1305.93      (2.8%)    0.9% (  -5% -    8%) 0.399
                       range     3640.14      (6.4%)     3674.49      (6.8%)    0.9% ( -11% -   15%) 0.651
                    HighTerm      948.31      (4.4%)      959.25      (2.5%)    1.2% (  -5% -    8%) 0.308
                OrHighNotMed      748.02      (4.4%)      758.29      (4.1%)    1.4% (  -6% -   10%) 0.310
                      IntNRQ     1051.87      (2.1%)     1066.42      (2.5%)    1.4% (  -3% -    6%) 0.062
 BrowseRandomLabelTaxoFacets        2.40      (1.7%)        2.43      (5.8%)    1.4% (  -5% -    9%) 0.303
   BrowseDayOfYearTaxoFacets        3.21      (4.9%)        3.26      (7.8%)    1.6% ( -10% -   14%) 0.439
        BrowseDateTaxoFacets        3.18      (5.0%)        3.24      (8.2%)    1.7% ( -10% -   15%) 0.423
                OrHighNotLow      931.69      (4.6%)      953.01      (4.2%)    2.3% (  -6% -   11%) 0.101
        BrowseDateSSDVFacets        0.88      (9.3%)        0.91      (7.3%)    3.1% ( -12% -   21%) 0.240
   BrowseDayOfYearSSDVFacets        4.52     (11.8%)        4.72     (17.0%)    4.3% ( -21% -   37%) 0.348

shubhamsrkdev avatar Nov 27 '25 16:11 shubhamsrkdev

This was added for prefetching: https://github.com/apache/lucene/pull/13359

How does this perform in scenarios where prefetching is beneficial (e.g low memory)?

benwtrent avatar Dec 01 '25 15:12 benwtrent

This was added for prefetching: #13359

How does this perform in scenarios where prefetching is beneficial (e.g low memory)?

Thanks @benwtrent - is there a standard way to test this? How much would we consider low memory?

shubhamsrkdev avatar Dec 04 '25 23:12 shubhamsrkdev

@benwtrent this change gives us significant enough performance boost, I think it is because BooleanWeight has optimizations for clauses which return null scorerSupplier https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java#L293

But looks like it is bad for low memory scenarios, so I wonder if propagating isLoaded result from #prefetch https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L332 to TermStates to decide if we want to use lambda or call termExistsSupplier.get() right away could be the way?

epotyom avatar Dec 05 '25 18:12 epotyom

this change gives us significant enough performance boost

I honestly don't fully understand all the code here. I am simply pointing out that this change was made with a particular thing in mind, and that thing seems completely ignored (which we shouldn't do).

But looks like it is bad for low memory scenarios

I am not sure it is?

I am pushing back on this change simply because it is effectively a revert of a previous commit and we need a good reason for that (e.g. do we need prefetching? Are we wanting to handling it differently?)

benwtrent avatar Dec 05 '25 21:12 benwtrent

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Dec 20 '25 00:12 github-actions[bot]