lucene
lucene copied to clipboard
LUCENE-10376: Roll up the loop in vint/vlong in DataInput
https://issues.apache.org/jira/browse/LUCENE-10376
#11412
There are some more copies of this stuff in BufferedIndexInput and BlockPackedReaderIterator
There is ByteArrayDataInput too.
Thanks ! I've fixed these places and will do a benchmark based on the newest code tonight.
Great that the original code was still there in a comment. 🤗 This makes easy to review.
I don't fully understand the variant that supports negative values. Do we still need it as the default one again supports negative values. The code looks slightly different with the additional check at end.
In general we now longer can detect invalid vlongs/vints when reading. Should we maybe add a check at end? I am not sure if this can be done without additional branching. My idea would be to use the loop till one before last and add the code at end. But this makes method longer again, possibly breaking inlining.
If this works well we should really close the mmap one.
Thanks @uschindler .
I don't fully understand the variant that supports negative values. Do we still need it as the default one again supports negative values. The code looks slightly different with the additional check at end.
This is specialized becauseDataInput
's vlong uses the 10th byte to express the negative/postive while the implementation in BlockPackedReaderIterator
uses 8th bit of 9th byte (b9 & 0x80) to express.
In general we now longer can detect invalid vlongs/vints when reading. Should we maybe add a check at end? I am not sure if this can be done without additional branching. My idea would be to use the loop till one before last and add the code at end. But this makes method longer again, possibly breaking inlining.
I'd like to try to add some check for it if this version of code can work well.
Here is the report (30 JVM * 200 repeat):
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermTitleBDVSort 34.07 (21.0%) 32.13 (12.4%) -5.7% ( -32% - 35%) 0.201
HighTermMonthSort 33.30 (14.3%) 32.05 (12.6%) -3.7% ( -26% - 26%) 0.282
TermDTSort 46.29 (14.0%) 44.99 (8.7%) -2.8% ( -22% - 23%) 0.348
HighTermDayOfYearSort 49.67 (14.6%) 48.40 (7.6%) -2.6% ( -21% - 23%) 0.393
MedIntervalsOrdered 17.72 (9.7%) 17.29 (9.8%) -2.5% ( -20% - 18%) 0.330
LowIntervalsOrdered 88.12 (10.3%) 86.02 (10.6%) -2.4% ( -21% - 20%) 0.376
OrNotHighLow 915.59 (3.9%) 894.14 (4.1%) -2.3% ( -9% - 5%) 0.022
HighIntervalsOrdered 7.67 (7.7%) 7.50 (7.7%) -2.3% ( -16% - 14%) 0.251
MedTermDayTaxoFacets 20.63 (5.4%) 20.17 (5.3%) -2.2% ( -12% - 8%) 0.107
Wildcard 45.85 (9.1%) 44.84 (7.5%) -2.2% ( -17% - 15%) 0.303
AndHighLow 519.98 (5.8%) 510.23 (5.7%) -1.9% ( -12% - 10%) 0.204
BrowseMonthSSDVFacets 4.98 (20.0%) 4.90 (12.1%) -1.5% ( -28% - 38%) 0.728
OrHighLow 472.12 (4.8%) 465.47 (4.3%) -1.4% ( -10% - 8%) 0.231
OrHighMedDayTaxoFacets 4.44 (5.8%) 4.38 (6.3%) -1.4% ( -12% - 11%) 0.381
AndHighMedDayTaxoFacets 16.83 (2.3%) 16.61 (3.3%) -1.3% ( -6% - 4%) 0.073
OrNotHighMed 734.01 (3.0%) 725.45 (3.3%) -1.2% ( -7% - 5%) 0.156
MedPhrase 66.57 (4.5%) 65.81 (4.2%) -1.1% ( -9% - 7%) 0.312
AndHighHighDayTaxoFacets 8.66 (2.5%) 8.57 (3.2%) -1.1% ( -6% - 4%) 0.140
OrHighNotHigh 990.99 (3.5%) 980.89 (3.7%) -1.0% ( -7% - 6%) 0.270
HighTerm 1382.00 (3.5%) 1369.51 (3.7%) -0.9% ( -7% - 6%) 0.337
HighSloppyPhrase 9.56 (6.3%) 9.48 (5.8%) -0.9% ( -12% - 11%) 0.566
PKLookup 207.67 (5.3%) 205.90 (5.6%) -0.9% ( -11% - 10%) 0.545
OrNotHighHigh 831.60 (3.5%) 824.52 (3.9%) -0.9% ( -7% - 6%) 0.368
OrHighNotLow 1299.72 (3.3%) 1289.81 (3.7%) -0.8% ( -7% - 6%) 0.395
OrHighNotMed 1002.10 (3.4%) 994.79 (4.1%) -0.7% ( -7% - 7%) 0.455
MedSpanNear 3.83 (4.3%) 3.80 (3.9%) -0.7% ( -8% - 7%) 0.527
BrowseRandomLabelTaxoFacets 4.31 (11.8%) 4.28 (11.5%) -0.6% ( -21% - 25%) 0.834
BrowseDateTaxoFacets 4.92 (14.2%) 4.89 (14.6%) -0.6% ( -25% - 32%) 0.867
MedTerm 1548.17 (3.4%) 1538.83 (3.5%) -0.6% ( -7% - 6%) 0.495
LowTerm 1659.17 (2.0%) 1649.23 (1.6%) -0.6% ( -4% - 3%) 0.204
LowSloppyPhrase 2.88 (4.4%) 2.86 (4.9%) -0.6% ( -9% - 9%) 0.626
BrowseDayOfYearTaxoFacets 4.98 (14.4%) 4.95 (14.9%) -0.6% ( -26% - 33%) 0.880
MedSloppyPhrase 3.28 (4.7%) 3.26 (5.1%) -0.6% ( -9% - 9%) 0.664
HighSpanNear 21.69 (5.1%) 21.57 (4.6%) -0.5% ( -9% - 9%) 0.663
HighPhrase 14.62 (4.6%) 14.54 (4.3%) -0.5% ( -9% - 8%) 0.644
LowPhrase 61.62 (3.5%) 61.29 (2.9%) -0.5% ( -6% - 6%) 0.527
LowSpanNear 126.62 (4.0%) 126.00 (4.0%) -0.5% ( -8% - 7%) 0.635
Respell 34.76 (3.0%) 34.85 (2.0%) 0.3% ( -4% - 5%) 0.704
OrHighMed 55.19 (5.2%) 55.40 (3.8%) 0.4% ( -8% - 9%) 0.744
OrHighHigh 13.42 (3.8%) 13.49 (4.4%) 0.5% ( -7% - 9%) 0.650
Prefix3 45.53 (10.9%) 45.76 (10.8%) 0.5% ( -19% - 24%) 0.862
Fuzzy1 59.51 (2.0%) 59.90 (2.0%) 0.7% ( -3% - 4%) 0.209
AndHighMed 138.53 (6.0%) 139.54 (3.6%) 0.7% ( -8% - 10%) 0.569
Fuzzy2 80.88 (3.1%) 81.66 (2.2%) 1.0% ( -4% - 6%) 0.170
AndHighHigh 26.04 (5.3%) 26.30 (3.8%) 1.0% ( -7% - 10%) 0.400
BrowseRandomLabelSSDVFacets 3.19 (5.7%) 3.25 (6.4%) 1.8% ( -9% - 14%) 0.254
BrowseDayOfYearSSDVFacets 4.31 (6.1%) 4.43 (6.9%) 2.8% ( -9% - 16%) 0.092
BrowseMonthTaxoFacets 5.17 (15.0%) 5.35 (16.2%) 3.4% ( -24% - 40%) 0.395
IntNRQ 24.73 (17.2%) 26.15 (20.2%) 5.7% ( -27% - 52%) 0.236
FYI here are some of my thoughts based on all these benchmark reports:
-
Rolling up loops for vint/vlong seems not bring a significant speed up (neither significant regression), IMO we should move on because the bug will no longer occur in java 17 ?
-
The mmap approach is showing a stable speed up, and "too many codes" seems not the major reason that prevents inline, so maybe (as @uschindler said) it is prevented by the try catch block? I'll suggest continue on the mmap approach too for the speed up.
Thanks. So unrolling brings no difference and as the bug is gone, we can replace all of those implementations.
About the ByteBufferIndexInput for MMapDirectory: This is really bad, we should maybe check the assembly code generated by Hotspot to figure out why it is not inlined.
So let's keep the other issue open to figure out more. +1 to commit this. We should also backport to 9.x, but then we should benchmark on Java 11, too.
Here is the bechmark result on Java 11:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthSSDVFacets 6.28 (17.6%) 6.02 (16.9%) -4.2% ( -32% - 36%) 0.441
IntNRQ 30.10 (26.3%) 29.12 (26.6%) -3.3% ( -44% - 67%) 0.696
AndHighLow 683.35 (3.5%) 662.78 (2.9%) -3.0% ( -9% - 3%) 0.003
MedTermDayTaxoFacets 8.68 (6.2%) 8.44 (4.2%) -2.8% ( -12% - 8%) 0.093
OrNotHighLow 1256.07 (2.4%) 1222.13 (2.9%) -2.7% ( -7% - 2%) 0.001
BrowseMonthTaxoFacets 5.65 (15.1%) 5.50 (13.6%) -2.6% ( -27% - 30%) 0.568
HighIntervalsOrdered 2.78 (5.4%) 2.71 (5.6%) -2.4% ( -12% - 8%) 0.158
BrowseDayOfYearTaxoFacets 5.73 (7.3%) 5.60 (6.3%) -2.2% ( -14% - 12%) 0.296
BrowseDateTaxoFacets 5.69 (6.9%) 5.56 (6.2%) -2.2% ( -14% - 11%) 0.296
Respell 40.59 (2.9%) 39.73 (1.9%) -2.1% ( -6% - 2%) 0.006
PKLookup 199.57 (3.3%) 195.58 (4.0%) -2.0% ( -9% - 5%) 0.084
AndHighMedDayTaxoFacets 8.83 (4.0%) 8.67 (3.6%) -1.9% ( -9% - 5%) 0.121
OrHighMedDayTaxoFacets 4.98 (6.4%) 4.89 (4.6%) -1.7% ( -11% - 9%) 0.340
AndHighHighDayTaxoFacets 13.33 (2.9%) 13.11 (2.7%) -1.6% ( -7% - 4%) 0.074
HighSpanNear 9.89 (3.2%) 9.75 (3.4%) -1.5% ( -7% - 5%) 0.160
Fuzzy2 75.58 (2.5%) 74.57 (2.0%) -1.3% ( -5% - 3%) 0.058
LowIntervalsOrdered 59.03 (2.9%) 58.35 (3.4%) -1.1% ( -7% - 5%) 0.253
HighTermMonthSort 55.30 (17.8%) 54.71 (9.6%) -1.1% ( -24% - 32%) 0.814
Fuzzy1 134.69 (2.7%) 133.33 (1.9%) -1.0% ( -5% - 3%) 0.171
Prefix3 45.51 (10.7%) 45.05 (11.3%) -1.0% ( -20% - 23%) 0.772
OrHighLow 361.23 (3.5%) 358.18 (3.7%) -0.8% ( -7% - 6%) 0.454
BrowseRandomLabelTaxoFacets 4.93 (4.5%) 4.89 (5.2%) -0.8% ( -10% - 9%) 0.612
Wildcard 65.14 (15.1%) 64.88 (17.9%) -0.4% ( -29% - 38%) 0.940
BrowseRandomLabelSSDVFacets 3.58 (3.3%) 3.57 (4.0%) -0.4% ( -7% - 7%) 0.758
OrNotHighMed 985.92 (2.9%) 982.55 (4.1%) -0.3% ( -7% - 6%) 0.760
OrHighNotMed 1237.04 (3.2%) 1233.74 (4.6%) -0.3% ( -7% - 7%) 0.832
MedIntervalsOrdered 33.04 (1.9%) 32.97 (2.5%) -0.2% ( -4% - 4%) 0.772
LowSpanNear 46.22 (3.3%) 46.15 (2.9%) -0.2% ( -6% - 6%) 0.874
LowPhrase 25.32 (2.2%) 25.28 (2.5%) -0.2% ( -4% - 4%) 0.837
OrNotHighHigh 905.71 (3.4%) 904.75 (5.1%) -0.1% ( -8% - 8%) 0.938
MedSpanNear 4.76 (4.1%) 4.76 (3.4%) 0.1% ( -7% - 7%) 0.925
OrHighNotHigh 921.87 (3.5%) 923.33 (5.2%) 0.2% ( -8% - 9%) 0.910
OrHighNotLow 1980.39 (2.9%) 1984.53 (4.1%) 0.2% ( -6% - 7%) 0.853
LowTerm 1512.89 (2.9%) 1517.16 (3.1%) 0.3% ( -5% - 6%) 0.763
HighTerm 1730.61 (3.4%) 1736.20 (4.6%) 0.3% ( -7% - 8%) 0.800
MedTerm 1651.08 (3.6%) 1656.84 (4.5%) 0.3% ( -7% - 8%) 0.787
HighSloppyPhrase 6.00 (5.0%) 6.03 (7.0%) 0.5% ( -10% - 13%) 0.798
HighTermTitleBDVSort 86.33 (14.0%) 86.87 (10.6%) 0.6% ( -20% - 29%) 0.873
MedPhrase 20.39 (1.9%) 20.53 (2.3%) 0.7% ( -3% - 4%) 0.302
HighPhrase 11.20 (2.2%) 11.28 (2.8%) 0.7% ( -4% - 5%) 0.358
AndHighHigh 20.43 (3.6%) 20.64 (4.5%) 1.0% ( -6% - 9%) 0.435
BrowseDayOfYearSSDVFacets 5.40 (11.2%) 5.46 (13.5%) 1.0% ( -21% - 28%) 0.793
AndHighMed 63.51 (4.1%) 64.18 (4.2%) 1.1% ( -6% - 9%) 0.416
MedSloppyPhrase 10.69 (4.6%) 10.82 (7.2%) 1.2% ( -10% - 13%) 0.525
OrHighMed 111.38 (3.9%) 112.77 (4.3%) 1.2% ( -6% - 9%) 0.341
OrHighHigh 24.64 (3.2%) 25.02 (4.2%) 1.6% ( -5% - 9%) 0.188
LowSloppyPhrase 17.96 (4.7%) 18.31 (6.9%) 2.0% ( -9% - 14%) 0.288
HighTermDayOfYearSort 44.12 (12.3%) 45.34 (10.1%) 2.8% ( -17% - 28%) 0.437
TermDTSort 22.12 (19.1%) 22.83 (16.6%) 3.2% ( -27% - 48%) 0.570
Java version:
java version "11.0.6" 2020-01-14 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.6+8-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.6+8-LTS, mixed mode)
Thanks.
Did you check for other ReadVint variants? There may be more. Maybe use the type hierarchy functions to investigate all child classes of DataInput.
I'm using IntelliJ IDEA and it can help detect child classes that override the readVInt/readVLong. I also greped long readVLong
to avoid similar fork (but not override) like what we do in BlockPackedReaderIterator
. It seems there is no other ReadVint variants.
readVInt
readVLong
I would also take this one.
Sorry, I forgot about this PR. Thanks for merging!
Here is the nightly benchmark result of 2022.10.24 (commits diff): https://home.apache.org/~mikemccand/lucenebench/2022.10.24.18.02.38.html
Benefited Tasks (p-value < 0.01)
- PKLookUp: https://home.apache.org/~mikemccand/lucenebench/PKLookup.html
- AndHighOrMedMed: https://home.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html
- AndMedOrHighHigh: https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html
- TermMonthSort: https://home.apache.org/~mikemccand/lucenebench/TermMonthSort.html
Regression Tasks (p-value < 0.01)
- Fuzzy1: https://home.apache.org/~mikemccand/lucenebench/Fuzzy1.html
- Fuzzy2: https://home.apache.org/~mikemccand/lucenebench/Fuzzy2.html