lucene icon indicating copy to clipboard operation
lucene copied to clipboard

LUCENE-10376: Roll up the loop in vint/vlong in DataInput

Open gf2121 opened this issue 3 years ago • 11 comments

https://issues.apache.org/jira/browse/LUCENE-10376

#11412

gf2121 avatar Jan 13 '22 09:01 gf2121

There are some more copies of this stuff in BufferedIndexInput and BlockPackedReaderIterator

rmuir avatar Jan 13 '22 10:01 rmuir

There is ByteArrayDataInput too.

jpountz avatar Jan 13 '22 13:01 jpountz

Thanks ! I've fixed these places and will do a benchmark based on the newest code tonight.

gf2121 avatar Jan 13 '22 16:01 gf2121

Great that the original code was still there in a comment. 🤗 This makes easy to review.

I don't fully understand the variant that supports negative values. Do we still need it as the default one again supports negative values. The code looks slightly different with the additional check at end.

In general we now longer can detect invalid vlongs/vints when reading. Should we maybe add a check at end? I am not sure if this can be done without additional branching. My idea would be to use the loop till one before last and add the code at end. But this makes method longer again, possibly breaking inlining.

If this works well we should really close the mmap one.

uschindler avatar Jan 13 '22 17:01 uschindler

Thanks @uschindler .

I don't fully understand the variant that supports negative values. Do we still need it as the default one again supports negative values. The code looks slightly different with the additional check at end.

This is specialized becauseDataInput's vlong uses the 10th byte to express the negative/postive while the implementation in BlockPackedReaderIterator uses 8th bit of 9th byte (b9 & 0x80) to express.

In general we now longer can detect invalid vlongs/vints when reading. Should we maybe add a check at end? I am not sure if this can be done without additional branching. My idea would be to use the loop till one before last and add the code at end. But this makes method longer again, possibly breaking inlining.

I'd like to try to add some check for it if this version of code can work well.

gf2121 avatar Jan 13 '22 17:01 gf2121

Here is the report (30 JVM * 200 repeat):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            HighTermTitleBDVSort       34.07     (21.0%)       32.13     (12.4%)   -5.7% ( -32% -   35%) 0.201
               HighTermMonthSort       33.30     (14.3%)       32.05     (12.6%)   -3.7% ( -26% -   26%) 0.282
                      TermDTSort       46.29     (14.0%)       44.99      (8.7%)   -2.8% ( -22% -   23%) 0.348
           HighTermDayOfYearSort       49.67     (14.6%)       48.40      (7.6%)   -2.6% ( -21% -   23%) 0.393
             MedIntervalsOrdered       17.72      (9.7%)       17.29      (9.8%)   -2.5% ( -20% -   18%) 0.330
             LowIntervalsOrdered       88.12     (10.3%)       86.02     (10.6%)   -2.4% ( -21% -   20%) 0.376
                    OrNotHighLow      915.59      (3.9%)      894.14      (4.1%)   -2.3% (  -9% -    5%) 0.022
            HighIntervalsOrdered        7.67      (7.7%)        7.50      (7.7%)   -2.3% ( -16% -   14%) 0.251
            MedTermDayTaxoFacets       20.63      (5.4%)       20.17      (5.3%)   -2.2% ( -12% -    8%) 0.107
                        Wildcard       45.85      (9.1%)       44.84      (7.5%)   -2.2% ( -17% -   15%) 0.303
                      AndHighLow      519.98      (5.8%)      510.23      (5.7%)   -1.9% ( -12% -   10%) 0.204
           BrowseMonthSSDVFacets        4.98     (20.0%)        4.90     (12.1%)   -1.5% ( -28% -   38%) 0.728
                       OrHighLow      472.12      (4.8%)      465.47      (4.3%)   -1.4% ( -10% -    8%) 0.231
          OrHighMedDayTaxoFacets        4.44      (5.8%)        4.38      (6.3%)   -1.4% ( -12% -   11%) 0.381
         AndHighMedDayTaxoFacets       16.83      (2.3%)       16.61      (3.3%)   -1.3% (  -6% -    4%) 0.073
                    OrNotHighMed      734.01      (3.0%)      725.45      (3.3%)   -1.2% (  -7% -    5%) 0.156
                       MedPhrase       66.57      (4.5%)       65.81      (4.2%)   -1.1% (  -9% -    7%) 0.312
        AndHighHighDayTaxoFacets        8.66      (2.5%)        8.57      (3.2%)   -1.1% (  -6% -    4%) 0.140
                   OrHighNotHigh      990.99      (3.5%)      980.89      (3.7%)   -1.0% (  -7% -    6%) 0.270
                        HighTerm     1382.00      (3.5%)     1369.51      (3.7%)   -0.9% (  -7% -    6%) 0.337
                HighSloppyPhrase        9.56      (6.3%)        9.48      (5.8%)   -0.9% ( -12% -   11%) 0.566
                        PKLookup      207.67      (5.3%)      205.90      (5.6%)   -0.9% ( -11% -   10%) 0.545
                   OrNotHighHigh      831.60      (3.5%)      824.52      (3.9%)   -0.9% (  -7% -    6%) 0.368
                    OrHighNotLow     1299.72      (3.3%)     1289.81      (3.7%)   -0.8% (  -7% -    6%) 0.395
                    OrHighNotMed     1002.10      (3.4%)      994.79      (4.1%)   -0.7% (  -7% -    7%) 0.455
                     MedSpanNear        3.83      (4.3%)        3.80      (3.9%)   -0.7% (  -8% -    7%) 0.527
     BrowseRandomLabelTaxoFacets        4.31     (11.8%)        4.28     (11.5%)   -0.6% ( -21% -   25%) 0.834
            BrowseDateTaxoFacets        4.92     (14.2%)        4.89     (14.6%)   -0.6% ( -25% -   32%) 0.867
                         MedTerm     1548.17      (3.4%)     1538.83      (3.5%)   -0.6% (  -7% -    6%) 0.495
                         LowTerm     1659.17      (2.0%)     1649.23      (1.6%)   -0.6% (  -4% -    3%) 0.204
                 LowSloppyPhrase        2.88      (4.4%)        2.86      (4.9%)   -0.6% (  -9% -    9%) 0.626
       BrowseDayOfYearTaxoFacets        4.98     (14.4%)        4.95     (14.9%)   -0.6% ( -26% -   33%) 0.880
                 MedSloppyPhrase        3.28      (4.7%)        3.26      (5.1%)   -0.6% (  -9% -    9%) 0.664
                    HighSpanNear       21.69      (5.1%)       21.57      (4.6%)   -0.5% (  -9% -    9%) 0.663
                      HighPhrase       14.62      (4.6%)       14.54      (4.3%)   -0.5% (  -9% -    8%) 0.644
                       LowPhrase       61.62      (3.5%)       61.29      (2.9%)   -0.5% (  -6% -    6%) 0.527
                     LowSpanNear      126.62      (4.0%)      126.00      (4.0%)   -0.5% (  -8% -    7%) 0.635
                         Respell       34.76      (3.0%)       34.85      (2.0%)    0.3% (  -4% -    5%) 0.704
                       OrHighMed       55.19      (5.2%)       55.40      (3.8%)    0.4% (  -8% -    9%) 0.744
                      OrHighHigh       13.42      (3.8%)       13.49      (4.4%)    0.5% (  -7% -    9%) 0.650
                         Prefix3       45.53     (10.9%)       45.76     (10.8%)    0.5% ( -19% -   24%) 0.862
                          Fuzzy1       59.51      (2.0%)       59.90      (2.0%)    0.7% (  -3% -    4%) 0.209
                      AndHighMed      138.53      (6.0%)      139.54      (3.6%)    0.7% (  -8% -   10%) 0.569
                          Fuzzy2       80.88      (3.1%)       81.66      (2.2%)    1.0% (  -4% -    6%) 0.170
                     AndHighHigh       26.04      (5.3%)       26.30      (3.8%)    1.0% (  -7% -   10%) 0.400
     BrowseRandomLabelSSDVFacets        3.19      (5.7%)        3.25      (6.4%)    1.8% (  -9% -   14%) 0.254
       BrowseDayOfYearSSDVFacets        4.31      (6.1%)        4.43      (6.9%)    2.8% (  -9% -   16%) 0.092
           BrowseMonthTaxoFacets        5.17     (15.0%)        5.35     (16.2%)    3.4% ( -24% -   40%) 0.395
                          IntNRQ       24.73     (17.2%)       26.15     (20.2%)    5.7% ( -27% -   52%) 0.236

gf2121 avatar Jan 14 '22 05:01 gf2121

FYI here are some of my thoughts based on all these benchmark reports:

  • Rolling up loops for vint/vlong seems not bring a significant speed up (neither significant regression), IMO we should move on because the bug will no longer occur in java 17 ?

  • The mmap approach is showing a stable speed up, and "too many codes" seems not the major reason that prevents inline, so maybe (as @uschindler said) it is prevented by the try catch block? I'll suggest continue on the mmap approach too for the speed up.

gf2121 avatar Jan 14 '22 07:01 gf2121

Thanks. So unrolling brings no difference and as the bug is gone, we can replace all of those implementations.

About the ByteBufferIndexInput for MMapDirectory: This is really bad, we should maybe check the assembly code generated by Hotspot to figure out why it is not inlined.

So let's keep the other issue open to figure out more. +1 to commit this. We should also backport to 9.x, but then we should benchmark on Java 11, too.

uschindler avatar Jan 14 '22 13:01 uschindler

Here is the bechmark result on Java 11:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           BrowseMonthSSDVFacets        6.28     (17.6%)        6.02     (16.9%)   -4.2% ( -32% -   36%) 0.441
                          IntNRQ       30.10     (26.3%)       29.12     (26.6%)   -3.3% ( -44% -   67%) 0.696
                      AndHighLow      683.35      (3.5%)      662.78      (2.9%)   -3.0% (  -9% -    3%) 0.003
            MedTermDayTaxoFacets        8.68      (6.2%)        8.44      (4.2%)   -2.8% ( -12% -    8%) 0.093
                    OrNotHighLow     1256.07      (2.4%)     1222.13      (2.9%)   -2.7% (  -7% -    2%) 0.001
           BrowseMonthTaxoFacets        5.65     (15.1%)        5.50     (13.6%)   -2.6% ( -27% -   30%) 0.568
            HighIntervalsOrdered        2.78      (5.4%)        2.71      (5.6%)   -2.4% ( -12% -    8%) 0.158
       BrowseDayOfYearTaxoFacets        5.73      (7.3%)        5.60      (6.3%)   -2.2% ( -14% -   12%) 0.296
            BrowseDateTaxoFacets        5.69      (6.9%)        5.56      (6.2%)   -2.2% ( -14% -   11%) 0.296
                         Respell       40.59      (2.9%)       39.73      (1.9%)   -2.1% (  -6% -    2%) 0.006
                        PKLookup      199.57      (3.3%)      195.58      (4.0%)   -2.0% (  -9% -    5%) 0.084
         AndHighMedDayTaxoFacets        8.83      (4.0%)        8.67      (3.6%)   -1.9% (  -9% -    5%) 0.121
          OrHighMedDayTaxoFacets        4.98      (6.4%)        4.89      (4.6%)   -1.7% ( -11% -    9%) 0.340
        AndHighHighDayTaxoFacets       13.33      (2.9%)       13.11      (2.7%)   -1.6% (  -7% -    4%) 0.074
                    HighSpanNear        9.89      (3.2%)        9.75      (3.4%)   -1.5% (  -7% -    5%) 0.160
                          Fuzzy2       75.58      (2.5%)       74.57      (2.0%)   -1.3% (  -5% -    3%) 0.058
             LowIntervalsOrdered       59.03      (2.9%)       58.35      (3.4%)   -1.1% (  -7% -    5%) 0.253
               HighTermMonthSort       55.30     (17.8%)       54.71      (9.6%)   -1.1% ( -24% -   32%) 0.814
                          Fuzzy1      134.69      (2.7%)      133.33      (1.9%)   -1.0% (  -5% -    3%) 0.171
                         Prefix3       45.51     (10.7%)       45.05     (11.3%)   -1.0% ( -20% -   23%) 0.772
                       OrHighLow      361.23      (3.5%)      358.18      (3.7%)   -0.8% (  -7% -    6%) 0.454
     BrowseRandomLabelTaxoFacets        4.93      (4.5%)        4.89      (5.2%)   -0.8% ( -10% -    9%) 0.612
                        Wildcard       65.14     (15.1%)       64.88     (17.9%)   -0.4% ( -29% -   38%) 0.940
     BrowseRandomLabelSSDVFacets        3.58      (3.3%)        3.57      (4.0%)   -0.4% (  -7% -    7%) 0.758
                    OrNotHighMed      985.92      (2.9%)      982.55      (4.1%)   -0.3% (  -7% -    6%) 0.760
                    OrHighNotMed     1237.04      (3.2%)     1233.74      (4.6%)   -0.3% (  -7% -    7%) 0.832
             MedIntervalsOrdered       33.04      (1.9%)       32.97      (2.5%)   -0.2% (  -4% -    4%) 0.772
                     LowSpanNear       46.22      (3.3%)       46.15      (2.9%)   -0.2% (  -6% -    6%) 0.874
                       LowPhrase       25.32      (2.2%)       25.28      (2.5%)   -0.2% (  -4% -    4%) 0.837
                   OrNotHighHigh      905.71      (3.4%)      904.75      (5.1%)   -0.1% (  -8% -    8%) 0.938
                     MedSpanNear        4.76      (4.1%)        4.76      (3.4%)    0.1% (  -7% -    7%) 0.925
                   OrHighNotHigh      921.87      (3.5%)      923.33      (5.2%)    0.2% (  -8% -    9%) 0.910
                    OrHighNotLow     1980.39      (2.9%)     1984.53      (4.1%)    0.2% (  -6% -    7%) 0.853
                         LowTerm     1512.89      (2.9%)     1517.16      (3.1%)    0.3% (  -5% -    6%) 0.763
                        HighTerm     1730.61      (3.4%)     1736.20      (4.6%)    0.3% (  -7% -    8%) 0.800
                         MedTerm     1651.08      (3.6%)     1656.84      (4.5%)    0.3% (  -7% -    8%) 0.787
                HighSloppyPhrase        6.00      (5.0%)        6.03      (7.0%)    0.5% ( -10% -   13%) 0.798
            HighTermTitleBDVSort       86.33     (14.0%)       86.87     (10.6%)    0.6% ( -20% -   29%) 0.873
                       MedPhrase       20.39      (1.9%)       20.53      (2.3%)    0.7% (  -3% -    4%) 0.302
                      HighPhrase       11.20      (2.2%)       11.28      (2.8%)    0.7% (  -4% -    5%) 0.358
                     AndHighHigh       20.43      (3.6%)       20.64      (4.5%)    1.0% (  -6% -    9%) 0.435
       BrowseDayOfYearSSDVFacets        5.40     (11.2%)        5.46     (13.5%)    1.0% ( -21% -   28%) 0.793
                      AndHighMed       63.51      (4.1%)       64.18      (4.2%)    1.1% (  -6% -    9%) 0.416
                 MedSloppyPhrase       10.69      (4.6%)       10.82      (7.2%)    1.2% ( -10% -   13%) 0.525
                       OrHighMed      111.38      (3.9%)      112.77      (4.3%)    1.2% (  -6% -    9%) 0.341
                      OrHighHigh       24.64      (3.2%)       25.02      (4.2%)    1.6% (  -5% -    9%) 0.188
                 LowSloppyPhrase       17.96      (4.7%)       18.31      (6.9%)    2.0% (  -9% -   14%) 0.288
           HighTermDayOfYearSort       44.12     (12.3%)       45.34     (10.1%)    2.8% ( -17% -   28%) 0.437
                      TermDTSort       22.12     (19.1%)       22.83     (16.6%)    3.2% ( -27% -   48%) 0.570

Java version:

java version "11.0.6" 2020-01-14 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.6+8-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.6+8-LTS, mixed mode)

gf2121 avatar Jan 16 '22 08:01 gf2121

Thanks.

Did you check for other ReadVint variants? There may be more. Maybe use the type hierarchy functions to investigate all child classes of DataInput.

I'm using IntelliJ IDEA and it can help detect child classes that override the readVInt/readVLong. I also greped long readVLong to avoid similar fork (but not override) like what we do in BlockPackedReaderIterator. It seems there is no other ReadVint variants.

readVInt

image

image

readVLong

image

image

gf2121 avatar Jan 16 '22 18:01 gf2121

I would also take this one.

uschindler avatar Jan 17 '22 13:01 uschindler

Sorry, I forgot about this PR. Thanks for merging!

uschindler avatar Oct 24 '22 16:10 uschindler

Here is the nightly benchmark result of 2022.10.24 (commits diff): https://home.apache.org/~mikemccand/lucenebench/2022.10.24.18.02.38.html

Benefited Tasks (p-value < 0.01)

  • PKLookUp: https://home.apache.org/~mikemccand/lucenebench/PKLookup.html
  • AndHighOrMedMed: https://home.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html
  • AndMedOrHighHigh: https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html
  • TermMonthSort: https://home.apache.org/~mikemccand/lucenebench/TermMonthSort.html

Regression Tasks (p-value < 0.01)

  • Fuzzy1: https://home.apache.org/~mikemccand/lucenebench/Fuzzy1.html
  • Fuzzy2: https://home.apache.org/~mikemccand/lucenebench/Fuzzy2.html

gf2121 avatar Oct 28 '22 14:10 gf2121