LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte.
https://issues.apache.org/jira/browse/LUCENE-10366 #11402
Today, we do not rewrite #readVInt and #readVLong for ByteBufferIndexInput. By default, the logic will call #readByte several times, and we need to check whether ByteBuffer is valid every time. This may not be necessary as we just need a final check.
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseDayOfYearSSDVFacets 16.74 (17.3%) 15.91 (12.3%) -5.0% ( -29% - 29%) 0.295
MedTermDayTaxoFacets 27.01 (6.9%) 26.56 (5.9%) -1.7% ( -13% - 11%) 0.402
Wildcard 111.55 (8.1%) 109.67 (7.6%) -1.7% ( -16% - 15%) 0.499
Respell 58.06 (2.6%) 57.20 (2.6%) -1.5% ( -6% - 3%) 0.074
OrHighMedDayTaxoFacets 8.91 (4.7%) 8.81 (7.2%) -1.1% ( -12% - 11%) 0.557
Fuzzy1 117.17 (3.8%) 116.14 (3.3%) -0.9% ( -7% - 6%) 0.437
Fuzzy2 103.70 (3.2%) 102.82 (4.3%) -0.9% ( -8% - 6%) 0.472
HighIntervalsOrdered 10.11 (7.9%) 10.05 (7.4%) -0.6% ( -14% - 15%) 0.797
HighTermDayOfYearSort 183.18 (8.8%) 182.92 (10.8%) -0.1% ( -18% - 21%) 0.964
AndHighHighDayTaxoFacets 11.44 (3.8%) 11.43 (3.1%) -0.1% ( -6% - 7%) 0.936
Prefix3 161.90 (13.5%) 161.80 (13.3%) -0.1% ( -23% - 30%) 0.989
HighSpanNear 11.43 (4.8%) 11.45 (4.2%) 0.1% ( -8% - 9%) 0.928
PKLookup 220.15 (3.3%) 220.69 (6.2%) 0.2% ( -8% - 10%) 0.874
MedSpanNear 92.60 (4.0%) 93.11 (3.7%) 0.5% ( -6% - 8%) 0.656
TermDTSort 143.26 (9.0%) 144.14 (10.9%) 0.6% ( -17% - 22%) 0.847
MedIntervalsOrdered 63.74 (6.6%) 64.21 (6.1%) 0.8% ( -11% - 14%) 0.707
HighTermTitleBDVSort 99.61 (9.1%) 100.49 (12.4%) 0.9% ( -18% - 24%) 0.796
LowSpanNear 126.43 (3.6%) 127.61 (3.2%) 0.9% ( -5% - 8%) 0.383
LowIntervalsOrdered 12.45 (5.4%) 12.58 (5.2%) 1.0% ( -9% - 12%) 0.535
LowTerm 1767.08 (3.7%) 1788.83 (3.1%) 1.2% ( -5% - 8%) 0.257
HighSloppyPhrase 11.45 (7.0%) 11.61 (7.1%) 1.5% ( -11% - 16%) 0.515
AndHighMedDayTaxoFacets 69.41 (3.7%) 70.46 (2.8%) 1.5% ( -4% - 8%) 0.147
BrowseRandomLabelSSDVFacets 10.85 (6.1%) 11.04 (5.1%) 1.7% ( -9% - 13%) 0.342
MedTerm 2083.04 (5.3%) 2119.48 (5.7%) 1.7% ( -8% - 13%) 0.316
LowSloppyPhrase 148.79 (3.6%) 151.76 (3.2%) 2.0% ( -4% - 9%) 0.062
HighPhrase 98.67 (3.4%) 100.80 (3.5%) 2.2% ( -4% - 9%) 0.048
OrHighNotLow 1371.31 (7.1%) 1400.91 (7.9%) 2.2% ( -12% - 18%) 0.365
BrowseMonthTaxoFacets 16.65 (11.6%) 17.03 (13.1%) 2.2% ( -20% - 30%) 0.565
OrHighNotHigh 1267.37 (6.8%) 1297.42 (8.9%) 2.4% ( -12% - 19%) 0.344
MedSloppyPhrase 39.35 (3.6%) 40.42 (4.2%) 2.7% ( -4% - 10%) 0.028
OrNotHighHigh 1190.01 (6.6%) 1224.72 (7.6%) 2.9% ( -10% - 18%) 0.194
OrHighHigh 37.72 (4.3%) 39.00 (3.4%) 3.4% ( -4% - 11%) 0.005
AndHighHigh 92.46 (4.5%) 95.76 (4.9%) 3.6% ( -5% - 13%) 0.017
OrHighNotMed 1231.31 (6.3%) 1275.65 (7.9%) 3.6% ( -9% - 18%) 0.109
OrHighMed 174.32 (3.8%) 181.43 (2.9%) 4.1% ( -2% - 11%) 0.000
AndHighLow 2761.91 (10.7%) 2885.28 (10.1%) 4.5% ( -14% - 28%) 0.175
MedPhrase 214.87 (4.9%) 224.55 (4.8%) 4.5% ( -4% - 14%) 0.003
LowPhrase 333.03 (3.8%) 348.43 (3.6%) 4.6% ( -2% - 12%) 0.000
HighTermMonthSort 159.92 (9.8%) 167.50 (14.8%) 4.7% ( -18% - 32%) 0.232
OrNotHighMed 973.50 (6.0%) 1022.10 (6.0%) 5.0% ( -6% - 18%) 0.008
BrowseRandomLabelTaxoFacets 13.14 (11.1%) 13.83 (14.0%) 5.3% ( -17% - 34%) 0.186
HighTerm 1682.54 (7.0%) 1786.66 (7.5%) 6.2% ( -7% - 22%) 0.007
AndHighMed 277.66 (3.6%) 295.75 (4.3%) 6.5% ( -1% - 14%) 0.000
BrowseDateTaxoFacets 14.81 (12.9%) 15.78 (17.1%) 6.6% ( -20% - 42%) 0.170
BrowseDayOfYearTaxoFacets 14.90 (13.1%) 15.93 (17.4%) 6.9% ( -20% - 43%) 0.158
OrNotHighLow 1255.52 (7.0%) 1344.26 (7.1%) 7.1% ( -6% - 22%) 0.002
OrHighLow 972.15 (5.3%) 1056.13 (4.8%) 8.6% ( -1% - 19%) 0.000
IntNRQ 99.91 (26.0%) 108.80 (19.2%) 8.9% ( -28% - 73%) 0.219
BrowseMonthSSDVFacets 18.62 (20.7%) 20.45 (25.0%) 9.8% ( -29% - 70%) 0.175
I think there is another issue with this PR, which is that if a vInt is written across two ByteBuffers, then your PR would only ensure the validity of one of these ByteBuffers, while I think it should check both for safety.
Thanks @jpountz ! I did not realize that reading ummapped bytebuffer will make JVM crach, sorry!
we should keep calling ensureValid before reading any byte from the ByteBuffer while your PR does it afterwards. I would also prefer keeping access to the ByteBuffers encapsulated into ByteBufferGuard.
Sorry for my poor english, i'm not sure i've got your point. So to confirm, which one is you are meaning?
-
This approach need to be changed to implement a
ByteBufferGuard#getVIntthat just ensureValid once beforereadVInt(also consider the issue that vint exists across several bytebuffers). -
We need to EnsureValid for each byte read, like what we did before. so this approach need to be given up and some other ways (like using codecs other than vints for some of the tasks) should be tried.
I'm having this question because i have not understood why todays ensurevalid can protect readings on bytebuffer, it seems there should be a case that (thread 1)ensurevalid -> (thread 2)unmap -> (thread 1)read unmapped. Is the Thread.yield() helping solve this case? Will this still work for a heavier operation like readVInt?
I checked some other usage, seeing that for operations like bytebuffer#readBytes we also just checked once. If the length <= 6, DirectByteBuffer will read byte one by one, then it should be similar to the readVInt logic. So based on this i guess ensureValid once can work for vint? I implemented this and tests got passed too.
Thanks @gf2121 the implementation looks correct to me now. Are you still seeing a good speedup with this change?
The problem here is that we just reducing the number of guard checks and at same time raising the risk to segv: ReadVint uses multiple atomic reads and all of those reads can segv.
Sorry to tell this: I would not change anything here, it is too risky. Let's wait for MemorySegmentIndexInput, which has JVM internal checks and a guard is not needed. see #518
Thanks @jpountz @uschindler , this is the benchmark result based on the newest codes
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermTitleBDVSort 168.66 (13.3%) 163.17 (11.2%) -3.3% ( -24% - 24%) 0.404
BrowseMonthSSDVFacets 20.35 (25.8%) 19.83 (25.1%) -2.6% ( -42% - 65%) 0.751
BrowseMonthTaxoFacets 17.31 (14.9%) 16.96 (14.8%) -2.1% ( -27% - 32%) 0.661
MedTermDayTaxoFacets 48.47 (4.8%) 47.52 (4.1%) -2.0% ( -10% - 7%) 0.159
BrowseDayOfYearSSDVFacets 16.71 (14.5%) 16.41 (14.9%) -1.8% ( -27% - 32%) 0.702
HighIntervalsOrdered 25.70 (10.9%) 25.33 (11.0%) -1.5% ( -21% - 22%) 0.674
MedSloppyPhrase 21.62 (4.8%) 21.38 (3.1%) -1.1% ( -8% - 7%) 0.393
TermDTSort 238.94 (13.0%) 236.98 (12.8%) -0.8% ( -23% - 28%) 0.841
HighSloppyPhrase 39.84 (4.1%) 39.60 (2.9%) -0.6% ( -7% - 6%) 0.588
BrowseDayOfYearTaxoFacets 15.89 (15.4%) 15.83 (16.6%) -0.4% ( -28% - 37%) 0.933
LowTerm 1698.75 (3.5%) 1691.63 (3.4%) -0.4% ( -7% - 6%) 0.702
BrowseDateTaxoFacets 15.72 (15.3%) 15.67 (16.5%) -0.4% ( -27% - 37%) 0.942
OrHighMedDayTaxoFacets 15.19 (4.3%) 15.14 (4.9%) -0.3% ( -9% - 9%) 0.813
Respell 81.95 (2.6%) 81.68 (2.7%) -0.3% ( -5% - 5%) 0.694
PKLookup 227.94 (4.3%) 227.37 (5.3%) -0.2% ( -9% - 9%) 0.870
BrowseRandomLabelSSDVFacets 10.93 (6.6%) 10.92 (5.8%) -0.1% ( -11% - 13%) 0.955
IntNRQ 144.63 (22.4%) 144.48 (22.3%) -0.1% ( -36% - 57%) 0.989
Fuzzy2 88.59 (2.3%) 88.51 (2.2%) -0.1% ( -4% - 4%) 0.897
LowIntervalsOrdered 93.64 (6.0%) 93.72 (5.9%) 0.1% ( -11% - 12%) 0.962
Fuzzy1 137.32 (3.1%) 137.58 (2.7%) 0.2% ( -5% - 6%) 0.833
MedIntervalsOrdered 17.06 (6.3%) 17.09 (6.4%) 0.2% ( -11% - 13%) 0.922
HighTermDayOfYearSort 82.27 (8.1%) 82.50 (11.8%) 0.3% ( -18% - 21%) 0.930
BrowseRandomLabelTaxoFacets 13.63 (12.7%) 13.68 (13.3%) 0.3% ( -22% - 30%) 0.934
AndHighHighDayTaxoFacets 6.39 (2.6%) 6.42 (2.9%) 0.5% ( -4% - 6%) 0.578
LowSloppyPhrase 55.91 (2.7%) 56.43 (1.8%) 0.9% ( -3% - 5%) 0.193
MedTerm 2031.88 (3.7%) 2054.31 (4.9%) 1.1% ( -7% - 10%) 0.419
Prefix3 129.38 (9.3%) 130.94 (8.8%) 1.2% ( -15% - 21%) 0.676
AndHighHigh 67.64 (4.4%) 68.46 (4.5%) 1.2% ( -7% - 10%) 0.390
Wildcard 105.85 (9.5%) 107.23 (8.8%) 1.3% ( -15% - 21%) 0.652
HighSpanNear 12.80 (3.8%) 13.01 (4.0%) 1.7% ( -5% - 9%) 0.177
HighTermMonthSort 130.57 (9.9%) 132.98 (11.1%) 1.8% ( -17% - 25%) 0.579
MedSpanNear 73.79 (3.4%) 75.30 (2.7%) 2.0% ( -3% - 8%) 0.035
OrHighHigh 53.48 (5.1%) 54.77 (4.4%) 2.4% ( -6% - 12%) 0.106
LowSpanNear 16.05 (3.5%) 16.47 (3.6%) 2.6% ( -4% - 10%) 0.018
OrHighNotHigh 1162.70 (3.6%) 1194.55 (4.4%) 2.7% ( -5% - 11%) 0.030
OrNotHighMed 1428.26 (3.0%) 1468.20 (4.8%) 2.8% ( -4% - 10%) 0.027
HighTerm 2587.85 (4.5%) 2662.11 (5.2%) 2.9% ( -6% - 13%) 0.061
OrHighMed 182.60 (4.8%) 188.40 (4.3%) 3.2% ( -5% - 12%) 0.026
OrNotHighHigh 1079.06 (3.2%) 1114.06 (4.6%) 3.2% ( -4% - 11%) 0.010
OrHighNotMed 1278.47 (3.8%) 1320.86 (4.4%) 3.3% ( -4% - 12%) 0.011
AndHighMed 170.34 (2.9%) 176.45 (3.7%) 3.6% ( -2% - 10%) 0.001
OrHighNotLow 932.47 (3.6%) 976.66 (5.1%) 4.7% ( -3% - 14%) 0.001
HighPhrase 168.16 (3.0%) 176.27 (3.0%) 4.8% ( -1% - 11%) 0.000
LowPhrase 48.60 (2.6%) 50.98 (2.5%) 4.9% ( 0% - 10%) 0.000
AndHighMedDayTaxoFacets 58.86 (2.6%) 62.10 (2.6%) 5.5% ( 0% - 11%) 0.000
MedPhrase 145.00 (3.1%) 153.94 (2.3%) 6.2% ( 0% - 11%) 0.000
OrHighLow 1078.49 (3.5%) 1149.09 (3.1%) 6.5% ( 0% - 13%) 0.000
AndHighLow 1149.40 (4.3%) 1275.69 (2.9%) 11.0% ( 3% - 18%) 0.000
OrNotHighLow 1765.89 (6.2%) 2019.83 (4.4%) 14.4% ( 3% - 26%) 0.000
Sorry to tell this: I would not change anything here, it is too risky. Let's wait for MemorySegmentIndexInput, which has JVM internal checks and a guard is not needed.
OK! I'll close this then.
The benchmark above shows no significant changes and it looks like ran with wrong JVM options (defaults by Mike were bad, fixed recently). It should run with tiered compilation enabled then the guard checks should go away.
Yes i did not run with tiered compilation, but in fact i'm running with the newest luceneutil. It seems we used to add this param but removed again a few days ago? See this commit: https://github.com/mikemccand/luceneutil/commit/f48b53858fa664c8d1f9637935add0ac42f28252
The param must go away or change - to +. Mike's commit fixed the tiered compilation problem and I was not sure if you have used his latest commit.
To get better predicatable results that are not so noisy (especially for those low-level changes) you should have a huge dataset (not only wikiMedium) and also run it with more queries/run. What you see here is mostly noise.
@uschindler Thank you for the benchmark guidance!
I made a new change in this commit that ensure valid before each call of ByteBuffer#get. And I run a benchmark on wikimediumall (16GB) with 20 rounds of JVM and each round repeat tasks 200 times (each jvm last 10 min and total time took 6h+). This benchmark should be similar to what we did in https://github.com/apache/lucene/pull/518. And here is the result:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthTaxoFacets 5.80 (14.4%) 5.50 (13.6%) -5.1% ( -29% - 26%) 0.246
HighTermTitleBDVSort 35.69 (30.2%) 34.46 (18.5%) -3.4% ( -40% - 64%) 0.664
Prefix3 194.68 (12.3%) 188.12 (13.2%) -3.4% ( -25% - 25%) 0.403
HighTermMonthSort 49.11 (28.0%) 47.66 (15.0%) -2.9% ( -35% - 55%) 0.678
OrHighHigh 16.37 (4.5%) 15.99 (4.0%) -2.4% ( -10% - 6%) 0.078
TermDTSort 22.81 (23.4%) 22.30 (12.8%) -2.3% ( -31% - 44%) 0.704
BrowseDayOfYearSSDVFacets 4.43 (10.4%) 4.34 (7.1%) -2.0% ( -17% - 17%) 0.473
HighIntervalsOrdered 3.92 (9.6%) 3.85 (8.4%) -1.9% ( -18% - 17%) 0.514
Wildcard 95.75 (14.1%) 94.10 (14.6%) -1.7% ( -26% - 31%) 0.704
BrowseRandomLabelTaxoFacets 4.55 (12.8%) 4.50 (12.2%) -1.1% ( -23% - 27%) 0.788
OrHighMed 47.41 (4.7%) 46.98 (4.1%) -0.9% ( -9% - 8%) 0.516
BrowseDateTaxoFacets 5.25 (16.3%) 5.21 (15.4%) -0.8% ( -27% - 37%) 0.881
BrowseDayOfYearTaxoFacets 5.31 (16.5%) 5.27 (15.6%) -0.7% ( -28% - 37%) 0.893
OrHighMedDayTaxoFacets 1.03 (5.0%) 1.02 (5.0%) -0.6% ( -10% - 9%) 0.700
MedTermDayTaxoFacets 15.74 (5.8%) 15.67 (3.8%) -0.5% ( -9% - 9%) 0.772
Fuzzy1 76.87 (2.4%) 76.53 (2.1%) -0.4% ( -4% - 4%) 0.542
PKLookup 204.50 (5.0%) 203.69 (4.8%) -0.4% ( -9% - 9%) 0.798
Fuzzy2 71.89 (2.1%) 71.64 (2.0%) -0.4% ( -4% - 3%) 0.586
LowSloppyPhrase 4.00 (4.2%) 3.99 (4.0%) -0.3% ( -8% - 8%) 0.833
Respell 46.07 (1.8%) 45.96 (2.1%) -0.2% ( -4% - 3%) 0.686
LowIntervalsOrdered 7.73 (4.9%) 7.73 (3.7%) -0.0% ( -8% - 9%) 0.988
HighTermDayOfYearSort 43.42 (23.3%) 43.54 (12.8%) 0.3% ( -29% - 47%) 0.965
AndHighHighDayTaxoFacets 6.99 (3.0%) 7.02 (2.6%) 0.5% ( -4% - 6%) 0.544
BrowseRandomLabelSSDVFacets 3.18 (6.5%) 3.21 (7.6%) 1.1% ( -12% - 16%) 0.637
HighSpanNear 1.70 (5.8%) 1.72 (4.9%) 1.1% ( -9% - 12%) 0.524
LowSpanNear 5.93 (3.4%) 6.00 (3.4%) 1.1% ( -5% - 8%) 0.312
MedSloppyPhrase 45.85 (2.9%) 46.42 (2.7%) 1.2% ( -4% - 6%) 0.161
MedSpanNear 23.88 (4.0%) 24.26 (3.8%) 1.6% ( -5% - 9%) 0.198
LowTerm 1699.14 (4.1%) 1729.13 (2.9%) 1.8% ( -4% - 9%) 0.114
BrowseMonthSSDVFacets 4.67 (4.1%) 4.76 (12.0%) 2.0% ( -13% - 18%) 0.483
MedPhrase 47.09 (2.2%) 48.24 (1.5%) 2.4% ( -1% - 6%) 0.000
HighSloppyPhrase 14.85 (2.6%) 15.22 (2.2%) 2.5% ( -2% - 7%) 0.001
MedIntervalsOrdered 7.07 (3.6%) 7.29 (2.5%) 3.2% ( -2% - 9%) 0.001
LowPhrase 12.80 (2.4%) 13.27 (1.7%) 3.6% ( 0% - 7%) 0.000
AndHighHigh 35.29 (3.1%) 36.71 (5.0%) 4.0% ( -4% - 12%) 0.002
AndHighMed 87.69 (2.4%) 91.35 (3.9%) 4.2% ( -2% - 10%) 0.000
OrHighNotHigh 1514.29 (4.5%) 1578.30 (4.0%) 4.2% ( -4% - 13%) 0.002
MedTerm 1838.55 (4.7%) 1917.70 (4.1%) 4.3% ( -4% - 13%) 0.002
OrNotHighMed 1182.37 (3.8%) 1237.57 (2.7%) 4.7% ( -1% - 11%) 0.000
OrHighNotLow 1218.07 (6.1%) 1279.72 (4.8%) 5.1% ( -5% - 17%) 0.004
AndHighMedDayTaxoFacets 20.90 (3.4%) 21.98 (2.0%) 5.2% ( 0% - 11%) 0.000
OrNotHighHigh 943.89 (5.8%) 999.13 (4.8%) 5.9% ( -4% - 17%) 0.001
HighTerm 1349.44 (5.8%) 1431.38 (5.3%) 6.1% ( -4% - 18%) 0.001
OrHighNotMed 926.10 (5.8%) 989.92 (5.6%) 6.9% ( -4% - 19%) 0.000
HighPhrase 290.29 (6.0%) 318.57 (3.1%) 9.7% ( 0% - 20%) 0.000
IntNRQ 32.61 (25.8%) 35.87 (16.2%) 10.0% ( -25% - 70%) 0.142
OrHighLow 547.59 (6.0%) 604.39 (3.2%) 10.4% ( 1% - 20%) 0.000
OrNotHighLow 886.23 (5.1%) 980.83 (2.2%) 10.7% ( 3% - 18%) 0.000
AndHighLow 608.29 (6.3%) 693.67 (3.5%) 14.0% ( 4% - 25%) 0.000
So it seems like the point that brings a speed up is not reducing number of valid check but something else ? Anyway, this approach is no longer raising the risk of segv and benchmark result seems positive so i reopened the PR to see if this will make sense to you now :)
Hi @gf2121, thanks for taking the time to test more. It looks really like the guard checks are not affecting the whole thing, but as always we don't really know where the optimization effects are coming from. For such low level changes it often also varies between different JVM version (there are differences between JDK 11 and JDK 17). Actually the code SHOULD not make any difference, but it has.
I see that you carefully also save the position before the try block and restore it on boundary crossing exception. What I don't like: ByteBufferGuard is just a hack and should not contain any program code except guarding. So I am not happy of including readVInt code there.
Have you tried in just moving the Guard#readVInt() code into ByteBufferIndexInput#readVInt() inside its try block calling guard.readByte() -- all guard methods should be inlined by JVM very early? This is where it belongs! Nevertheless I wonder why Hotspot is not able to apply the optimization. The only thing we spare is the try/catch, but maybe thats the issue here that prevents it from being inlined into readVInt.
If that helps we can also try the same on the MemorySegment one from #518.
I think about this:
@Override
public final int readVInt() throws IOException {
try {
// using #position instead of #mark here as #position is a final method
int pos = curBuf.position();
try {
byte b = guard.get(curBuf);
if (b >= 0) return b;
int i = b & 0x7F;
b = guard.get(curBuf);
i |= (b & 0x7F) << 7;
if (b >= 0) return i;
b = guard.get(curBuf);
i |= (b & 0x7F) << 14;
if (b >= 0) return i;
b = guard.get(curBuf);
i |= (b & 0x7F) << 21;
if (b >= 0) return i;
b = guard.get(curBuf);
// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
i |= (b & 0x0F) << 28;
if ((b & 0xF0) == 0) return i;
throw new IOException("Invalid vInt detected (too many bits)");
} catch (
@SuppressWarnings("unused")
BufferUnderflowException e) {
curBuf.position(pos);
return super.readVInt();
}
} catch (
@SuppressWarnings("unused")
NullPointerException npe) {
throw new AlreadyClosedException("Already closed: " + this);
}
}
I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longer affects any of the JVM versions we support to my knowledge, maybe we should consider rolling up these loops again (assuming it helps performance).
I wonder if one reason why this helps is because this method is so large that it is prevented from inlining sub function calls. The JVM bug that affected vint/vlong reads no longer affects any of the JVM versions we support to my knowledge, maybe we should consider rolling up these loops again (assuming it helps performance).
We only loose the last check on last iteration (the "copypaste" statement).
We should maybe convert all of them back to loops in DataInput base class and remove the specializations and do more tests. I think the change here is then completely obsolete.
+1 let's make a separate change to look at re-rolling the loop back. the JVM bug in question doesn't affect this version of lucene (or probably the last one either?), it was from ancient java days: I think java 1.7.x?
Thanks all! I've finished benchmarks (20 JVM * 200 repeat):
Read byte from guard
Here is the code
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermTitleBDVSort 60.79 (18.2%) 56.94 (14.0%) -6.3% ( -32% - 31%) 0.217
IntNRQ 207.39 (11.4%) 198.01 (16.6%) -4.5% ( -29% - 26%) 0.316
PKLookup 206.16 (5.4%) 197.27 (6.1%) -4.3% ( -15% - 7%) 0.018
BrowseDayOfYearTaxoFacets 4.93 (14.5%) 4.78 (12.8%) -2.9% ( -26% - 28%) 0.506
BrowseRandomLabelTaxoFacets 4.24 (10.9%) 4.12 (7.9%) -2.9% ( -19% - 17%) 0.340
BrowseDateTaxoFacets 4.87 (14.3%) 4.74 (12.6%) -2.8% ( -26% - 28%) 0.508
OrHighMedDayTaxoFacets 2.59 (4.8%) 2.55 (4.9%) -1.6% ( -10% - 8%) 0.310
HighTermMonthSort 78.30 (16.3%) 77.27 (12.7%) -1.3% ( -26% - 33%) 0.777
HighIntervalsOrdered 5.97 (8.5%) 5.90 (9.8%) -1.1% ( -17% - 18%) 0.708
MedTermDayTaxoFacets 18.18 (3.3%) 18.00 (4.0%) -1.0% ( -8% - 6%) 0.377
BrowseMonthSSDVFacets 4.71 (6.8%) 4.67 (5.6%) -0.9% ( -12% - 12%) 0.662
OrHighHigh 17.15 (4.3%) 17.00 (4.4%) -0.8% ( -9% - 8%) 0.538
Respell 39.54 (2.3%) 39.21 (3.0%) -0.8% ( -5% - 4%) 0.334
HighSpanNear 7.88 (5.6%) 7.84 (6.0%) -0.6% ( -11% - 11%) 0.755
BrowseRandomLabelSSDVFacets 3.18 (6.2%) 3.16 (6.1%) -0.6% ( -12% - 12%) 0.776
Fuzzy2 64.38 (2.9%) 64.32 (3.4%) -0.1% ( -6% - 6%) 0.935
BrowseMonthTaxoFacets 5.14 (16.6%) 5.14 (15.2%) -0.0% ( -27% - 38%) 0.996
Fuzzy1 103.02 (2.8%) 103.11 (3.1%) 0.1% ( -5% - 6%) 0.930
LowTerm 1856.79 (2.4%) 1858.93 (2.2%) 0.1% ( -4% - 4%) 0.875
LowIntervalsOrdered 54.54 (7.0%) 54.64 (8.9%) 0.2% ( -14% - 17%) 0.945
AndHighHighDayTaxoFacets 12.76 (2.4%) 12.80 (2.2%) 0.3% ( -4% - 5%) 0.694
BrowseDayOfYearSSDVFacets 4.29 (6.1%) 4.30 (6.0%) 0.3% ( -11% - 13%) 0.859
Wildcard 45.26 (12.3%) 45.47 (11.5%) 0.5% ( -20% - 27%) 0.898
OrHighMed 72.00 (4.6%) 72.37 (4.1%) 0.5% ( -7% - 9%) 0.712
HighSloppyPhrase 5.53 (3.4%) 5.56 (4.4%) 0.5% ( -7% - 8%) 0.672
MedSloppyPhrase 33.17 (2.8%) 33.36 (3.5%) 0.6% ( -5% - 7%) 0.565
MedTerm 1513.55 (4.0%) 1528.19 (3.0%) 1.0% ( -5% - 8%) 0.388
Prefix3 89.81 (11.0%) 90.93 (10.1%) 1.3% ( -17% - 25%) 0.707
MedSpanNear 7.77 (5.2%) 7.89 (5.5%) 1.5% ( -8% - 12%) 0.369
LowSloppyPhrase 42.76 (1.6%) 43.44 (2.6%) 1.6% ( -2% - 5%) 0.023
OrNotHighMed 1267.71 (3.0%) 1289.27 (2.5%) 1.7% ( -3% - 7%) 0.051
MedIntervalsOrdered 26.64 (4.4%) 27.10 (5.4%) 1.7% ( -7% - 12%) 0.271
OrHighNotLow 1233.37 (3.8%) 1255.68 (2.9%) 1.8% ( -4% - 8%) 0.095
OrHighNotMed 1217.40 (3.9%) 1239.56 (2.9%) 1.8% ( -4% - 9%) 0.096
OrNotHighHigh 1125.83 (3.5%) 1152.51 (2.6%) 2.4% ( -3% - 8%) 0.016
TermDTSort 57.78 (15.7%) 59.15 (19.5%) 2.4% ( -28% - 44%) 0.671
AndHighMedDayTaxoFacets 44.66 (2.9%) 45.86 (2.0%) 2.7% ( -2% - 7%) 0.001
LowSpanNear 21.46 (3.4%) 22.04 (3.2%) 2.7% ( -3% - 9%) 0.009
OrHighNotHigh 989.24 (3.8%) 1017.06 (2.7%) 2.8% ( -3% - 9%) 0.007
HighTermDayOfYearSort 30.92 (15.3%) 31.82 (13.9%) 2.9% ( -22% - 37%) 0.531
HighTerm 864.50 (4.9%) 892.97 (4.0%) 3.3% ( -5% - 12%) 0.020
AndHighHigh 34.00 (4.0%) 35.21 (4.9%) 3.6% ( -5% - 12%) 0.011
MedPhrase 51.35 (4.1%) 53.23 (3.7%) 3.7% ( -3% - 11%) 0.003
AndHighMed 59.83 (4.5%) 62.31 (4.1%) 4.1% ( -4% - 13%) 0.002
LowPhrase 71.34 (3.6%) 74.34 (3.0%) 4.2% ( -2% - 11%) 0.000
OrHighLow 389.91 (5.2%) 413.76 (3.2%) 6.1% ( -2% - 15%) 0.000
HighPhrase 146.45 (6.5%) 155.68 (5.1%) 6.3% ( -4% - 19%) 0.001
OrNotHighLow 752.41 (6.5%) 831.54 (2.5%) 10.5% ( 1% - 20%) 0.000
AndHighLow 794.98 (6.8%) 888.70 (3.6%) 11.8% ( 1% - 23%) 0.000
Roll up VInt / Vlong loop
Here is the code
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthTaxoFacets 5.17 (15.9%) 5.00 (12.1%) -3.4% ( -27% - 29%) 0.446
OrNotHighLow 1010.74 (4.0%) 978.71 (4.6%) -3.2% ( -11% - 5%) 0.021
HighPhrase 171.95 (3.6%) 166.92 (4.6%) -2.9% ( -10% - 5%) 0.025
AndHighLow 594.12 (4.2%) 577.24 (5.4%) -2.8% ( -11% - 7%) 0.064
OrHighLow 540.46 (4.1%) 526.17 (5.4%) -2.6% ( -11% - 7%) 0.083
OrHighMedDayTaxoFacets 6.01 (5.3%) 5.88 (3.9%) -2.2% ( -10% - 7%) 0.136
AndHighMedDayTaxoFacets 14.78 (2.6%) 14.51 (2.1%) -1.8% ( -6% - 2%) 0.013
MedPhrase 142.26 (2.9%) 139.67 (3.1%) -1.8% ( -7% - 4%) 0.058
LowPhrase 21.22 (2.8%) 20.85 (3.1%) -1.8% ( -7% - 4%) 0.061
AndHighHighDayTaxoFacets 4.31 (4.5%) 4.24 (3.2%) -1.7% ( -8% - 6%) 0.158
BrowseDayOfYearTaxoFacets 4.70 (17.3%) 4.63 (12.9%) -1.3% ( -26% - 34%) 0.787
BrowseDateTaxoFacets 4.65 (16.9%) 4.59 (12.9%) -1.2% ( -26% - 34%) 0.803
MedSloppyPhrase 34.40 (2.9%) 34.02 (4.0%) -1.1% ( -7% - 5%) 0.318
MedTermDayTaxoFacets 13.85 (6.7%) 13.70 (4.5%) -1.0% ( -11% - 10%) 0.563
BrowseRandomLabelTaxoFacets 4.16 (12.7%) 4.11 (9.7%) -1.0% ( -20% - 24%) 0.772
LowSloppyPhrase 5.77 (2.2%) 5.72 (3.3%) -0.9% ( -6% - 4%) 0.307
LowSpanNear 53.67 (3.6%) 53.22 (3.9%) -0.8% ( -8% - 6%) 0.481
HighSpanNear 2.66 (4.8%) 2.63 (5.4%) -0.8% ( -10% - 9%) 0.616
MedIntervalsOrdered 25.88 (9.4%) 25.68 (9.5%) -0.8% ( -17% - 20%) 0.797
OrHighNotHigh 1043.34 (3.7%) 1037.43 (4.4%) -0.6% ( -8% - 7%) 0.658
HighSloppyPhrase 1.47 (3.4%) 1.46 (4.2%) -0.6% ( -7% - 7%) 0.645
MedSpanNear 11.52 (3.5%) 11.46 (4.3%) -0.5% ( -7% - 7%) 0.685
OrNotHighHigh 1615.92 (3.4%) 1608.09 (3.6%) -0.5% ( -7% - 6%) 0.663
BrowseRandomLabelSSDVFacets 3.11 (6.0%) 3.10 (4.4%) -0.2% ( -10% - 10%) 0.881
LowIntervalsOrdered 4.06 (8.9%) 4.06 (8.9%) -0.2% ( -16% - 19%) 0.957
OrHighNotMed 1188.76 (3.8%) 1187.46 (4.4%) -0.1% ( -7% - 8%) 0.933
OrNotHighMed 1220.26 (3.1%) 1219.23 (3.7%) -0.1% ( -6% - 6%) 0.938
AndHighMed 115.92 (3.6%) 116.03 (3.3%) 0.1% ( -6% - 7%) 0.928
Fuzzy1 111.98 (3.2%) 112.15 (3.5%) 0.1% ( -6% - 7%) 0.889
HighIntervalsOrdered 5.14 (7.5%) 5.15 (7.3%) 0.2% ( -13% - 16%) 0.937
OrHighNotLow 1222.80 (4.1%) 1226.76 (4.7%) 0.3% ( -8% - 9%) 0.817
TermDTSort 51.02 (14.1%) 51.21 (18.9%) 0.4% ( -28% - 38%) 0.944
HighTerm 1570.53 (3.7%) 1578.45 (4.4%) 0.5% ( -7% - 8%) 0.693
BrowseDayOfYearSSDVFacets 4.26 (3.9%) 4.28 (9.1%) 0.5% ( -12% - 14%) 0.811
AndHighHigh 40.61 (4.1%) 40.83 (4.1%) 0.5% ( -7% - 9%) 0.681
MedTerm 2002.17 (3.6%) 2013.12 (4.3%) 0.5% ( -7% - 8%) 0.659
Respell 67.74 (3.8%) 68.14 (3.3%) 0.6% ( -6% - 8%) 0.594
LowTerm 1633.26 (2.8%) 1643.86 (2.6%) 0.6% ( -4% - 6%) 0.444
OrHighMed 52.80 (3.7%) 53.17 (3.8%) 0.7% ( -6% - 8%) 0.551
HighTermDayOfYearSort 61.86 (10.2%) 62.31 (13.0%) 0.7% ( -20% - 26%) 0.846
Fuzzy2 33.05 (3.1%) 33.31 (3.1%) 0.8% ( -5% - 7%) 0.420
OrHighHigh 40.94 (4.1%) 41.27 (3.6%) 0.8% ( -6% - 8%) 0.513
PKLookup 204.53 (5.7%) 206.21 (4.9%) 0.8% ( -9% - 12%) 0.623
BrowseMonthSSDVFacets 4.65 (4.4%) 4.74 (11.8%) 2.0% ( -13% - 19%) 0.488
HighTermTitleBDVSort 50.18 (11.7%) 51.93 (18.5%) 3.5% ( -23% - 38%) 0.475
Prefix3 83.02 (12.3%) 86.16 (12.9%) 3.8% ( -19% - 33%) 0.343
Wildcard 61.79 (5.6%) 64.22 (7.4%) 3.9% ( -8% - 17%) 0.058
HighTermMonthSort 50.81 (16.0%) 53.56 (16.4%) 5.4% ( -23% - 44%) 0.291
IntNRQ 56.84 (31.8%) 62.62 (26.8%) 10.2% ( -36% - 100%) 0.275
In addition, this is the java version used in all these benchmarks:
java version "17.0.1" 2021-10-19 LTS Java(TM) SE Runtime Environment (build 17.0.1+12-LTS-39) Java HotSpot(TM) 64-Bit Server VM (build 17.0.1+12-LTS-39, mixed mode, sharing)
Interesting, so it looks like rolling the loop back actually helps a bit. Can you open a new JIRA/PR for this change specifically @gf2121 ?
Thanks @jpountz . I opened https://issues.apache.org/jira/browse/LUCENE-10376 (https://github.com/apache/lucene/pull/602)
It can be seen that IntNRQ is getting a 10% speed up but i'm not very sure if it is a noise since the p-value is high and In several rounds of benchmark here i can see the result of IntNRQ is unstable:
- speed up 10% in first benchmark
- keep even in second benchmark
- speed up 10% again in third benchmark
- regress 5% in forth benchmark
On the other hand, the stable speed up of some of tasks (AndHighLow, OrHighLow) can not be seen any more in the loop vint approach. So i'm a bit tempted to continue on this approach(LUCENE-10366), what do you think?
To test if it is the try-catch block preventing inline, I made another experiment: call readByte() instead of guard.get(curBuf) to read byte, like this (Yes it is just a copy of DataInput#readVInt now).
public int readVInt() throws IOException {
byte b = readByte();
if (b >= 0) return b;
int i = b & 0x7F;
b = readByte();
i |= (b & 0x7F) << 7;
if (b >= 0) return i;
b = readByte();
i |= (b & 0x7F) << 14;
if (b >= 0) return i;
b = readByte();
i |= (b & 0x7F) << 21;
if (b >= 0) return i;
b = readByte();
// Warning: the next ands use 0x0F / 0xF0 - beware copy/paste errors:
i |= (b & 0x0F) << 28;
if ((b & 0xF0) == 0) return i;
throw new IOException("Invalid vInt detected (too many bits)");
}
And i'm still seeing a speed up:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermMonthSort 17.77 (20.4%) 16.91 (14.0%) -4.9% ( -32% - 37%) 0.378
HighTermTitleBDVSort 53.61 (15.2%) 51.18 (12.8%) -4.5% ( -28% - 27%) 0.308
BrowseMonthTaxoFacets 5.27 (16.7%) 5.12 (16.5%) -3.0% ( -31% - 36%) 0.569
HighTermDayOfYearSort 53.12 (15.6%) 51.56 (12.2%) -2.9% ( -26% - 29%) 0.506
IntNRQ 532.53 (1.8%) 523.43 (9.1%) -1.7% ( -12% - 9%) 0.411
BrowseRandomLabelTaxoFacets 4.33 (13.4%) 4.30 (10.5%) -0.8% ( -21% - 26%) 0.843
MedTermDayTaxoFacets 13.80 (5.4%) 13.73 (5.5%) -0.5% ( -10% - 10%) 0.765
TermDTSort 88.41 (12.6%) 88.12 (12.9%) -0.3% ( -22% - 28%) 0.935
BrowseDayOfYearTaxoFacets 4.96 (18.2%) 4.95 (15.4%) -0.1% ( -28% - 40%) 0.989
Prefix3 250.60 (14.1%) 250.43 (13.9%) -0.1% ( -24% - 32%) 0.987
BrowseDateTaxoFacets 4.90 (17.9%) 4.90 (15.1%) 0.0% ( -27% - 40%) 0.999
BrowseMonthSSDVFacets 4.69 (6.8%) 4.70 (6.5%) 0.2% ( -12% - 14%) 0.940
Fuzzy1 60.49 (1.9%) 60.67 (2.2%) 0.3% ( -3% - 4%) 0.655
Wildcard 53.25 (6.0%) 53.46 (6.5%) 0.4% ( -11% - 13%) 0.845
Respell 51.43 (2.3%) 51.66 (2.4%) 0.4% ( -4% - 5%) 0.544
OrHighMed 63.35 (3.2%) 63.73 (5.2%) 0.6% ( -7% - 9%) 0.652
PKLookup 206.79 (4.3%) 208.11 (4.5%) 0.6% ( -7% - 9%) 0.648
LowIntervalsOrdered 11.64 (5.5%) 11.72 (5.7%) 0.7% ( -9% - 12%) 0.708
HighIntervalsOrdered 2.85 (7.6%) 2.87 (8.1%) 0.7% ( -14% - 17%) 0.790
Fuzzy2 66.34 (1.9%) 66.88 (2.7%) 0.8% ( -3% - 5%) 0.271
HighSpanNear 6.61 (3.4%) 6.67 (4.1%) 1.0% ( -6% - 8%) 0.395
MedIntervalsOrdered 8.24 (5.1%) 8.34 (5.6%) 1.2% ( -8% - 12%) 0.467
HighSloppyPhrase 6.70 (6.4%) 6.78 (6.4%) 1.2% ( -10% - 14%) 0.543
BrowseDayOfYearSSDVFacets 4.22 (4.7%) 4.27 (4.7%) 1.3% ( -7% - 11%) 0.379
MedSloppyPhrase 33.87 (4.5%) 34.34 (4.4%) 1.4% ( -7% - 10%) 0.327
OrHighMedDayTaxoFacets 5.08 (5.1%) 5.16 (4.4%) 1.6% ( -7% - 11%) 0.302
BrowseRandomLabelSSDVFacets 3.15 (8.2%) 3.20 (7.3%) 1.6% ( -12% - 18%) 0.519
LowTerm 2114.49 (2.6%) 2152.48 (2.7%) 1.8% ( -3% - 7%) 0.031
AndHighHighDayTaxoFacets 4.41 (3.2%) 4.49 (3.0%) 1.8% ( -4% - 8%) 0.064
AndHighHigh 30.57 (4.5%) 31.14 (7.4%) 1.9% ( -9% - 14%) 0.342
OrHighHigh 35.60 (3.0%) 36.32 (3.8%) 2.0% ( -4% - 9%) 0.062
LowSpanNear 23.21 (2.6%) 23.73 (2.9%) 2.3% ( -3% - 8%) 0.010
AndHighMed 65.70 (4.9%) 67.45 (7.9%) 2.7% ( -9% - 16%) 0.200
MedSpanNear 77.65 (6.1%) 79.96 (6.9%) 3.0% ( -9% - 17%) 0.151
LowSloppyPhrase 36.66 (2.7%) 37.76 (2.3%) 3.0% ( -2% - 8%) 0.000
MedTerm 2602.20 (3.3%) 2692.40 (3.7%) 3.5% ( -3% - 10%) 0.002
OrHighNotLow 1157.68 (4.7%) 1207.92 (4.7%) 4.3% ( -4% - 14%) 0.004
MedPhrase 72.16 (3.0%) 75.49 (2.4%) 4.6% ( 0% - 10%) 0.000
OrHighNotMed 1074.08 (4.3%) 1126.38 (4.1%) 4.9% ( -3% - 13%) 0.000
OrNotHighMed 1093.93 (2.9%) 1157.33 (2.7%) 5.8% ( 0% - 11%) 0.000
LowPhrase 274.34 (3.2%) 290.37 (1.5%) 5.8% ( 1% - 10%) 0.000
HighPhrase 281.24 (3.3%) 298.54 (2.0%) 6.2% ( 0% - 11%) 0.000
OrHighLow 321.86 (2.7%) 342.74 (3.1%) 6.5% ( 0% - 12%) 0.000
OrNotHighHigh 1016.04 (4.5%) 1084.94 (4.3%) 6.8% ( -1% - 16%) 0.000
OrHighNotHigh 785.90 (5.9%) 845.39 (5.1%) 7.6% ( -3% - 19%) 0.000
HighTerm 1421.49 (5.5%) 1529.92 (5.8%) 7.6% ( -3% - 20%) 0.000
AndHighMedDayTaxoFacets 28.01 (3.3%) 30.41 (2.6%) 8.6% ( 2% - 14%) 0.000
OrNotHighLow 667.21 (4.4%) 736.53 (2.5%) 10.4% ( 3% - 18%) 0.000
AndHighLow 493.65 (6.5%) 561.58 (4.3%) 13.8% ( 2% - 26%) 0.000
This indeed sounds crazy. I suspect if this is because DataInput#readVInt calls DataInput#readByte and the abstraction layer makes JVM confused. While ByteBufferIndexInput#readVInt directly calls ByteBufferIndexInput#readByte (which is declared as final) ?
I got a flame graph on the benchmark and found some vtable stub there (which seems to justify this guess) :

As it can be the same as super, I tried to reuse super logic (call super.xxx). still seeing the speed up:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthTaxoFacets 5.07 (16.4%) 4.91 (16.0%) -3.2% ( -30% - 35%) 0.538
HighSpanNear 2.19 (4.4%) 2.17 (3.9%) -1.1% ( -9% - 7%) 0.385
HighSloppyPhrase 4.31 (5.2%) 4.28 (4.5%) -0.9% ( -10% - 9%) 0.571
MedSloppyPhrase 23.29 (5.5%) 23.12 (5.1%) -0.7% ( -10% - 10%) 0.669
MedTermDayTaxoFacets 5.43 (5.4%) 5.39 (4.1%) -0.7% ( -9% - 9%) 0.657
IntNRQ 35.17 (21.7%) 34.97 (21.5%) -0.6% ( -35% - 54%) 0.933
OrHighMedDayTaxoFacets 1.38 (6.2%) 1.37 (5.4%) -0.5% ( -11% - 11%) 0.784
Fuzzy1 113.72 (3.5%) 113.41 (3.3%) -0.3% ( -6% - 6%) 0.804
Fuzzy2 79.20 (3.3%) 78.99 (3.2%) -0.3% ( -6% - 6%) 0.795
Respell 54.75 (3.1%) 54.78 (3.0%) 0.0% ( -5% - 6%) 0.962
MedIntervalsOrdered 5.08 (8.3%) 5.08 (7.4%) 0.1% ( -14% - 17%) 0.979
PKLookup 204.91 (5.8%) 205.45 (6.6%) 0.3% ( -11% - 13%) 0.894
AndHighHighDayTaxoFacets 1.59 (5.8%) 1.60 (5.0%) 0.5% ( -9% - 11%) 0.792
BrowseDayOfYearSSDVFacets 4.20 (3.6%) 4.22 (3.9%) 0.5% ( -6% - 8%) 0.656
HighIntervalsOrdered 2.17 (8.3%) 2.18 (7.4%) 0.7% ( -13% - 17%) 0.791
Prefix3 74.40 (8.1%) 75.25 (7.9%) 1.1% ( -13% - 18%) 0.652
LowTerm 1924.36 (1.9%) 1947.29 (1.8%) 1.2% ( -2% - 4%) 0.040
BrowseRandomLabelTaxoFacets 4.16 (13.3%) 4.21 (11.2%) 1.3% ( -20% - 29%) 0.742
LowIntervalsOrdered 6.14 (5.4%) 6.24 (4.5%) 1.8% ( -7% - 12%) 0.258
LowSpanNear 36.68 (2.8%) 37.34 (2.6%) 1.8% ( -3% - 7%) 0.037
MedSpanNear 10.85 (2.7%) 11.06 (2.6%) 1.9% ( -3% - 7%) 0.025
BrowseDateTaxoFacets 4.76 (16.6%) 4.86 (14.3%) 2.0% ( -24% - 39%) 0.679
BrowseDayOfYearTaxoFacets 4.82 (16.8%) 4.91 (14.7%) 2.0% ( -25% - 40%) 0.683
LowSloppyPhrase 79.36 (2.7%) 81.09 (2.0%) 2.2% ( -2% - 7%) 0.004
TermDTSort 45.15 (11.3%) 46.17 (16.4%) 2.3% ( -22% - 33%) 0.612
HighTermDayOfYearSort 66.04 (10.3%) 67.57 (15.7%) 2.3% ( -21% - 31%) 0.580
MedTerm 1784.03 (2.6%) 1830.98 (2.8%) 2.6% ( -2% - 8%) 0.002
OrHighMed 50.97 (4.5%) 52.45 (4.4%) 2.9% ( -5% - 12%) 0.039
BrowseRandomLabelSSDVFacets 3.17 (11.6%) 3.27 (11.3%) 3.2% ( -17% - 29%) 0.370
OrHighNotLow 1390.67 (2.9%) 1436.11 (2.9%) 3.3% ( -2% - 9%) 0.000
AndHighMed 99.02 (3.9%) 102.35 (5.2%) 3.4% ( -5% - 12%) 0.020
AndHighMedDayTaxoFacets 26.52 (3.5%) 27.41 (2.4%) 3.4% ( -2% - 9%) 0.000
OrHighNotMed 1155.48 (2.9%) 1195.26 (2.9%) 3.4% ( -2% - 9%) 0.000
MedPhrase 25.46 (2.4%) 26.37 (2.6%) 3.6% ( -1% - 8%) 0.000
LowPhrase 81.21 (2.5%) 84.18 (2.6%) 3.7% ( -1% - 9%) 0.000
OrHighNotHigh 1685.00 (2.6%) 1750.81 (2.6%) 3.9% ( -1% - 9%) 0.000
HighTermMonthSort 73.01 (11.5%) 76.01 (15.7%) 4.1% ( -20% - 35%) 0.344
OrNotHighMed 978.17 (2.5%) 1018.54 (2.8%) 4.1% ( -1% - 9%) 0.000
HighTerm 1457.41 (4.0%) 1521.90 (3.8%) 4.4% ( -3% - 12%) 0.000
OrHighHigh 38.49 (4.6%) 40.24 (3.8%) 4.5% ( -3% - 13%) 0.001
OrNotHighHigh 848.77 (3.1%) 891.09 (3.3%) 5.0% ( -1% - 11%) 0.000
HighTermTitleBDVSort 56.20 (16.5%) 59.16 (19.5%) 5.3% ( -26% - 49%) 0.356
HighPhrase 125.88 (3.3%) 134.24 (3.2%) 6.6% ( 0% - 13%) 0.000
BrowseMonthSSDVFacets 4.57 (2.9%) 4.87 (19.0%) 6.7% ( -14% - 29%) 0.121
OrHighLow 433.74 (4.9%) 468.94 (4.3%) 8.1% ( -1% - 18%) 0.000
AndHighHigh 74.06 (5.0%) 80.29 (4.0%) 8.4% ( 0% - 18%) 0.000
Wildcard 113.47 (13.7%) 125.12 (10.1%) 10.3% ( -11% - 39%) 0.007
OrNotHighLow 782.47 (5.2%) 865.27 (2.7%) 10.6% ( 2% - 19%) 0.000
AndHighLow 339.38 (5.7%) 376.62 (4.9%) 11.0% ( 0% - 22%) 0.000
To ensure the benchmark is warmed up, I also run a benchmark that repeat AndHighLow task 20000 times (which was usually seen a good speed up in benchmarks before). Here is the report:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
AndHighLow 767.41 (9.7%) 898.67 (1.5%) 17.1% ( 5% - 31%) 0.000
I will try out the same on MemorySegmentIndexInput (#518).
I was also thinking about the following: Why not make readVInt/readVLong final in the DataInput class? This should be possible after the refactoring in #602 !
We should also make all other readZigZag and other DataInput utility methods that read specific data formats final. In most cases it makes no sense to implement them for performance reasons.
Thanks @uschindler !
I was also thinking about the following: Why not make readVInt/readVLong final in the DataInput class? This should be possible after the refactoring in #602 !
It seems like that it is the abstraction layer of #readByte (not #readVInt) making JVM confused. I'm not sure if making readVInt final can work, but i'd like to give it a try.
Although it is unrelated, but could you make that one final for consistency? https://github.com/apache/lucene/blob/97d669bcdb5e302a4fd79c648d5f621d90a9ac3b/lucene/core/src/java/org/apache/lucene/store/ByteBufferIndexInput.java#L140
readFloats() is final but not readLongs()
Here is the report of making vint final (50 repeat * 20 JVM):
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighTermTitleBDVSort 66.42 (20.7%) 61.57 (13.7%) -7.3% ( -34% - 34%) 0.188
IntNRQ 31.66 (30.7%) 30.04 (34.0%) -5.1% ( -53% - 85%) 0.615
HighTermMonthSort 50.98 (19.0%) 48.37 (14.2%) -5.1% ( -32% - 34%) 0.334
TermDTSort 23.29 (18.5%) 22.35 (15.4%) -4.0% ( -31% - 36%) 0.453
HighIntervalsOrdered 7.29 (10.2%) 7.00 (8.0%) -3.9% ( -20% - 15%) 0.174
HighTermDayOfYearSort 82.93 (22.1%) 79.98 (10.9%) -3.6% ( -29% - 37%) 0.519
MedIntervalsOrdered 8.13 (7.5%) 7.84 (5.1%) -3.5% ( -14% - 9%) 0.080
BrowseDateTaxoFacets 4.95 (16.0%) 4.82 (17.4%) -2.6% ( -31% - 36%) 0.619
BrowseDayOfYearTaxoFacets 5.01 (16.5%) 4.89 (17.7%) -2.5% ( -31% - 37%) 0.639
LowIntervalsOrdered 27.57 (5.6%) 26.97 (4.3%) -2.2% ( -11% - 8%) 0.169
BrowseRandomLabelSSDVFacets 3.14 (6.1%) 3.08 (7.0%) -1.9% ( -14% - 11%) 0.358
MedSloppyPhrase 13.28 (5.7%) 13.08 (5.6%) -1.5% ( -12% - 10%) 0.400
HighSloppyPhrase 3.26 (4.3%) 3.23 (4.2%) -1.0% ( -8% - 7%) 0.474
Respell 53.28 (2.3%) 52.78 (3.5%) -0.9% ( -6% - 4%) 0.310
BrowseRandomLabelTaxoFacets 4.29 (12.2%) 4.26 (13.4%) -0.8% ( -23% - 28%) 0.846
LowSloppyPhrase 13.00 (4.7%) 12.92 (4.8%) -0.6% ( -9% - 9%) 0.707
MedTermDayTaxoFacets 15.28 (4.8%) 15.20 (5.0%) -0.5% ( -9% - 9%) 0.741
AndHighLow 612.76 (6.3%) 610.25 (5.6%) -0.4% ( -11% - 12%) 0.828
Fuzzy1 99.32 (3.0%) 99.02 (3.2%) -0.3% ( -6% - 6%) 0.751
OrHighMedDayTaxoFacets 2.50 (5.4%) 2.50 (5.2%) -0.1% ( -10% - 11%) 0.937
AndHighHighDayTaxoFacets 3.31 (3.9%) 3.31 (3.6%) 0.0% ( -7% - 7%) 0.971
OrHighNotLow 1738.10 (5.5%) 1739.49 (3.7%) 0.1% ( -8% - 9%) 0.957
OrNotHighLow 947.48 (6.2%) 949.16 (5.0%) 0.2% ( -10% - 12%) 0.920
OrHighNotMed 1028.55 (5.1%) 1030.45 (4.4%) 0.2% ( -8% - 10%) 0.903
Wildcard 118.98 (11.3%) 119.38 (10.3%) 0.3% ( -19% - 24%) 0.921
AndHighMedDayTaxoFacets 34.54 (2.9%) 34.65 (2.5%) 0.3% ( -4% - 5%) 0.689
HighTerm 1470.90 (5.5%) 1476.02 (3.3%) 0.3% ( -8% - 9%) 0.809
Fuzzy2 98.76 (3.0%) 99.17 (2.9%) 0.4% ( -5% - 6%) 0.662
LowSpanNear 39.95 (3.0%) 40.15 (3.7%) 0.5% ( -6% - 7%) 0.626
AndHighHigh 21.48 (5.6%) 21.59 (4.6%) 0.5% ( -9% - 11%) 0.740
BrowseDayOfYearSSDVFacets 4.20 (6.7%) 4.23 (5.9%) 0.5% ( -11% - 14%) 0.786
BrowseMonthSSDVFacets 4.64 (6.5%) 4.67 (6.8%) 0.6% ( -11% - 14%) 0.778
LowTerm 1732.57 (3.5%) 1744.74 (2.6%) 0.7% ( -5% - 7%) 0.472
OrHighNotHigh 840.71 (4.9%) 846.83 (4.1%) 0.7% ( -7% - 10%) 0.611
MedTerm 1659.04 (5.1%) 1672.60 (3.9%) 0.8% ( -7% - 10%) 0.566
AndHighMed 58.10 (4.1%) 58.60 (3.8%) 0.9% ( -6% - 9%) 0.491
LowPhrase 523.50 (3.0%) 528.02 (2.2%) 0.9% ( -4% - 6%) 0.300
OrHighLow 590.49 (4.9%) 595.81 (4.1%) 0.9% ( -7% - 10%) 0.529
HighSpanNear 0.77 (5.9%) 0.78 (5.8%) 1.0% ( -10% - 13%) 0.574
OrNotHighMed 1071.83 (3.9%) 1084.40 (3.7%) 1.2% ( -6% - 9%) 0.326
MedSpanNear 41.11 (4.4%) 41.62 (4.7%) 1.2% ( -7% - 10%) 0.386
OrHighMed 74.79 (5.0%) 75.73 (4.9%) 1.3% ( -8% - 11%) 0.423
HighPhrase 278.97 (4.2%) 282.56 (2.6%) 1.3% ( -5% - 8%) 0.248
OrHighHigh 44.44 (5.8%) 45.01 (5.5%) 1.3% ( -9% - 13%) 0.470
OrNotHighHigh 1490.57 (4.3%) 1510.81 (3.3%) 1.4% ( -5% - 9%) 0.263
MedPhrase 91.14 (4.3%) 92.44 (2.6%) 1.4% ( -5% - 8%) 0.203
BrowseMonthTaxoFacets 5.34 (15.2%) 5.52 (28.6%) 3.4% ( -35% - 55%) 0.639
Prefix3 26.17 (21.3%) 27.12 (18.9%) 3.6% ( -30% - 55%) 0.568
OK, thanks, so let's keep the hack.
Hi @jpountz, if you agree I'd merge this in and backport!
Uwe
+1
@gf2121 Do you have an email I can use to contact you? Please send me an email at [email protected] if you don't want to leave it on GitHub.
Hi @jpountz. This is my email: [email protected]