lucene
lucene copied to clipboard
LUCENE-9335: [Discussion Only] Add BMM scorer and use it for pure disjunction term query
Implement BMM algorithm from "Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes" by Dimopoulos, Nepomnyachiy and Suel.
The BMM implementation passes all existing tests run by ./gradlew check
as well as luceneutil benchmark
luceneutil benchmark result with wikimedium5m
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighLow 537.97 (6.2%) 256.35 (2.8%) -52.3% ( -57% - -46%) 0.000
OrHighHigh 46.17 (2.4%) 22.41 (2.4%) -51.5% ( -54% - -47%) 0.000
OrHighMed 174.93 (3.3%) 87.04 (2.2%) -50.2% ( -54% - -46%) 0.000
Fuzzy2 81.84 (15.8%) 59.19 (8.7%) -27.7% ( -45% - -3%) 0.000
Fuzzy1 92.74 (7.4%) 75.87 (6.5%) -18.2% ( -29% - -4%) 0.000
OrHighNotLow 817.07 (4.2%) 790.10 (6.6%) -3.3% ( -13% - 7%) 0.060
LowTerm 1634.11 (3.3%) 1602.43 (4.3%) -1.9% ( -9% - 5%) 0.107
Prefix3 411.01 (2.4%) 403.38 (2.0%) -1.9% ( -6% - 2%) 0.008
MedTerm 1579.45 (5.0%) 1553.86 (4.7%) -1.6% ( -10% - 8%) 0.292
OrHighNotHigh 649.97 (4.4%) 641.91 (6.8%) -1.2% ( -11% - 10%) 0.496
HighPhrase 134.74 (3.0%) 133.12 (2.7%) -1.2% ( -6% - 4%) 0.184
HighTerm 1580.99 (3.7%) 1563.39 (4.8%) -1.1% ( -9% - 7%) 0.409
OrNotHighMed 748.48 (5.6%) 740.18 (6.2%) -1.1% ( -12% - 11%) 0.551
Wildcard 128.78 (2.4%) 127.73 (2.2%) -0.8% ( -5% - 3%) 0.271
OrNotHighLow 1013.49 (4.1%) 1008.11 (3.5%) -0.5% ( -7% - 7%) 0.656
MedPhrase 232.36 (2.1%) 231.17 (3.3%) -0.5% ( -5% - 5%) 0.558
AndHighHigh 100.09 (2.7%) 99.60 (3.2%) -0.5% ( -6% - 5%) 0.604
IntNRQ 118.54 (1.2%) 117.97 (1.8%) -0.5% ( -3% - 2%) 0.332
LowPhrase 82.96 (3.5%) 82.57 (3.1%) -0.5% ( -6% - 6%) 0.646
LowSloppyPhrase 68.53 (3.7%) 68.28 (3.2%) -0.4% ( -7% - 6%) 0.744
HighTermTitleBDVSort 302.06 (12.3%) 301.05 (15.3%) -0.3% ( -24% - 31%) 0.939
AndHighMed 405.77 (3.2%) 404.65 (3.2%) -0.3% ( -6% - 6%) 0.784
PKLookup 217.46 (2.9%) 217.05 (2.6%) -0.2% ( -5% - 5%) 0.829
BrowseMonthSSDVFacets 31.87 (1.7%) 31.81 (1.8%) -0.2% ( -3% - 3%) 0.752
Respell 105.24 (2.4%) 105.14 (2.5%) -0.1% ( -4% - 4%) 0.904
HighIntervalsOrdered 27.43 (2.3%) 27.45 (2.5%) 0.1% ( -4% - 5%) 0.902
MedSpanNear 175.50 (2.3%) 175.75 (2.5%) 0.1% ( -4% - 5%) 0.848
HighSloppyPhrase 29.61 (4.6%) 29.66 (3.4%) 0.2% ( -7% - 8%) 0.895
OrNotHighHigh 723.11 (4.7%) 724.68 (4.4%) 0.2% ( -8% - 9%) 0.880
LowSpanNear 108.75 (1.6%) 109.21 (2.0%) 0.4% ( -3% - 4%) 0.465
MedSloppyPhrase 90.19 (3.8%) 90.59 (3.0%) 0.4% ( -6% - 7%) 0.682
HighSpanNear 51.05 (3.7%) 51.31 (3.4%) 0.5% ( -6% - 7%) 0.646
BrowseDateTaxoFacets 11.05 (3.7%) 11.14 (3.3%) 0.8% ( -5% - 8%) 0.487
BrowseDayOfYearTaxoFacets 11.06 (3.7%) 11.16 (3.2%) 0.8% ( -5% - 8%) 0.446
BrowseMonthTaxoFacets 13.14 (3.0%) 13.27 (2.7%) 1.0% ( -4% - 6%) 0.295
BrowseDayOfYearSSDVFacets 27.88 (5.0%) 28.24 (2.0%) 1.3% ( -5% - 8%) 0.287
HighTermDayOfYearSort 184.68 (8.9%) 187.15 (9.1%) 1.3% ( -15% - 21%) 0.638
HighTermMonthSort 243.63 (11.9%) 247.11 (10.6%) 1.4% ( -18% - 27%) 0.690
OrHighNotMed 722.87 (4.3%) 733.49 (4.5%) 1.5% ( -7% - 10%) 0.293
TermDTSort 256.82 (13.0%) 261.88 (14.6%) 2.0% ( -22% - 33%) 0.651
AndHighLow 1106.79 (4.4%) 1144.40 (4.1%) 3.4% ( -4% - 12%) 0.012
2 luceneutil runs with wikimedium5m results:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighMed 196.58 (3.5%) 134.01 (2.5%) -31.8% ( -36% - -26%) 0.000
OrHighHigh 43.32 (2.0%) 31.45 (2.0%) -27.4% ( -30% - -23%) 0.000
Fuzzy1 94.71 (14.2%) 89.31 (16.9%) -5.7% ( -32% - 29%) 0.249
OrHighLow 549.71 (4.7%) 521.55 (5.8%) -5.1% ( -14% - 5%) 0.002
HighTermDayOfYearSort 284.66 (11.4%) 274.13 (11.4%) -3.7% ( -23% - 21%) 0.304
HighTerm 1392.83 (4.6%) 1352.76 (4.7%) -2.9% ( -11% - 6%) 0.050
HighPhrase 199.03 (2.8%) 193.93 (3.0%) -2.6% ( -8% - 3%) 0.006
TermDTSort 200.40 (10.9%) 196.04 (13.2%) -2.2% ( -23% - 24%) 0.570
OrNotHighLow 1137.55 (3.8%) 1114.38 (4.8%) -2.0% ( -10% - 6%) 0.138
OrNotHighMed 874.47 (5.8%) 857.02 (5.3%) -2.0% ( -12% - 9%) 0.259
HighTermMonthSort 285.86 (16.9%) 280.37 (15.4%) -1.9% ( -29% - 36%) 0.707
OrHighNotLow 797.17 (5.2%) 782.04 (5.5%) -1.9% ( -11% - 9%) 0.263
AndHighMed 259.03 (3.2%) 254.45 (4.8%) -1.8% ( -9% - 6%) 0.169
LowTerm 1619.44 (3.4%) 1592.71 (4.4%) -1.7% ( -9% - 6%) 0.184
MedTerm 1448.09 (3.7%) 1432.36 (4.6%) -1.1% ( -9% - 7%) 0.413
LowPhrase 530.26 (3.0%) 524.77 (3.8%) -1.0% ( -7% - 6%) 0.343
Wildcard 237.52 (1.7%) 235.18 (2.1%) -1.0% ( -4% - 2%) 0.104
PKLookup 226.73 (2.8%) 224.67 (2.5%) -0.9% ( -6% - 4%) 0.277
AndHighHigh 167.49 (3.4%) 166.40 (2.7%) -0.6% ( -6% - 5%) 0.505
MedSloppyPhrase 52.92 (1.9%) 52.60 (2.2%) -0.6% ( -4% - 3%) 0.339
MedSpanNear 128.06 (2.0%) 127.34 (2.3%) -0.6% ( -4% - 3%) 0.405
AndHighLow 1016.64 (3.9%) 1011.41 (4.2%) -0.5% ( -8% - 7%) 0.689
Prefix3 333.77 (3.9%) 332.40 (4.0%) -0.4% ( -8% - 7%) 0.742
LowSpanNear 45.55 (2.6%) 45.37 (2.6%) -0.4% ( -5% - 4%) 0.636
HighTermTitleBDVSort 152.08 (15.6%) 151.52 (14.8%) -0.4% ( -26% - 35%) 0.938
HighIntervalsOrdered 23.28 (1.9%) 23.20 (2.4%) -0.4% ( -4% - 3%) 0.597
HighSloppyPhrase 276.19 (2.4%) 275.46 (2.8%) -0.3% ( -5% - 5%) 0.749
BrowseMonthTaxoFacets 13.43 (2.7%) 13.40 (3.0%) -0.2% ( -5% - 5%) 0.827
OrNotHighHigh 730.45 (6.8%) 729.04 (5.1%) -0.2% ( -11% - 12%) 0.919
IntNRQ 137.68 (16.3%) 137.51 (16.6%) -0.1% ( -28% - 39%) 0.981
OrHighNotHigh 629.90 (4.9%) 630.02 (5.0%) 0.0% ( -9% - 10%) 0.990
LowSloppyPhrase 20.78 (1.2%) 20.79 (1.7%) 0.1% ( -2% - 3%) 0.895
Respell 117.77 (2.5%) 117.85 (2.4%) 0.1% ( -4% - 5%) 0.930
BrowseDayOfYearSSDVFacets 28.73 (2.6%) 28.75 (2.6%) 0.1% ( -4% - 5%) 0.930
MedPhrase 51.36 (3.1%) 51.41 (2.1%) 0.1% ( -4% - 5%) 0.902
HighSpanNear 3.58 (2.2%) 3.58 (2.8%) 0.2% ( -4% - 5%) 0.835
Fuzzy2 20.57 (4.2%) 20.66 (3.5%) 0.5% ( -6% - 8%) 0.710
OrHighNotMed 753.94 (6.4%) 758.23 (5.1%) 0.6% ( -10% - 12%) 0.756
BrowseDateTaxoFacets 11.29 (3.1%) 11.35 (3.0%) 0.6% ( -5% - 6%) 0.547
BrowseMonthSSDVFacets 31.97 (6.9%) 32.21 (5.4%) 0.8% ( -10% - 13%) 0.698
BrowseDayOfYearTaxoFacets 11.30 (3.0%) 11.38 (3.1%) 0.8% ( -5% - 7%) 0.427
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighHigh 77.68 (3.2%) 36.44 (3.3%) -53.1% ( -57% - -48%) 0.000
OrHighMed 265.36 (4.9%) 148.05 (2.9%) -44.2% ( -49% - -38%) 0.000
OrHighLow 563.62 (6.2%) 392.94 (3.6%) -30.3% ( -37% - -21%) 0.000
OrHighNotHigh 647.72 (6.3%) 625.42 (7.9%) -3.4% ( -16% - 11%) 0.126
IntNRQ 422.98 (3.5%) 410.61 (6.9%) -2.9% ( -12% - 7%) 0.091
OrHighNotLow 840.72 (7.0%) 818.98 (4.4%) -2.6% ( -13% - 9%) 0.162
HighTerm 1464.33 (4.7%) 1428.42 (5.3%) -2.5% ( -11% - 7%) 0.119
MedPhrase 486.70 (4.1%) 478.83 (3.3%) -1.6% ( -8% - 6%) 0.171
OrHighNotMed 738.23 (5.9%) 726.30 (7.4%) -1.6% ( -14% - 12%) 0.444
OrNotHighHigh 713.95 (4.5%) 703.94 (5.2%) -1.4% ( -10% - 8%) 0.358
BrowseDateTaxoFacets 10.78 (3.6%) 10.63 (4.1%) -1.4% ( -8% - 6%) 0.260
MedTerm 1709.93 (4.8%) 1686.84 (5.8%) -1.4% ( -11% - 9%) 0.422
LowTerm 1836.19 (5.4%) 1811.93 (6.6%) -1.3% ( -12% - 11%) 0.491
BrowseDayOfYearTaxoFacets 10.80 (3.8%) 10.66 (4.0%) -1.3% ( -8% - 6%) 0.309
HighPhrase 40.45 (2.7%) 39.99 (2.5%) -1.1% ( -6% - 4%) 0.160
BrowseMonthTaxoFacets 13.11 (2.6%) 12.98 (3.2%) -1.0% ( -6% - 4%) 0.254
HighSloppyPhrase 32.04 (5.6%) 31.70 (4.9%) -1.0% ( -10% - 10%) 0.535
TermDTSort 431.47 (12.5%) 427.83 (13.2%) -0.8% ( -23% - 28%) 0.836
Wildcard 289.98 (7.5%) 287.96 (5.3%) -0.7% ( -12% - 13%) 0.735
PKLookup 217.64 (3.8%) 216.31 (3.2%) -0.6% ( -7% - 6%) 0.580
HighIntervalsOrdered 4.21 (2.6%) 4.20 (3.0%) -0.4% ( -5% - 5%) 0.638
BrowseDayOfYearSSDVFacets 27.75 (2.6%) 27.64 (2.3%) -0.4% ( -5% - 4%) 0.589
AndHighMed 227.13 (2.3%) 226.33 (3.9%) -0.4% ( -6% - 5%) 0.729
AndHighHigh 110.85 (2.4%) 110.49 (3.1%) -0.3% ( -5% - 5%) 0.709
Prefix3 235.98 (3.2%) 235.41 (2.2%) -0.2% ( -5% - 5%) 0.787
LowSpanNear 45.89 (1.7%) 45.78 (1.6%) -0.2% ( -3% - 3%) 0.663
LowSloppyPhrase 24.25 (3.7%) 24.21 (3.5%) -0.2% ( -7% - 7%) 0.867
HighSpanNear 50.67 (3.8%) 50.59 (4.0%) -0.2% ( -7% - 7%) 0.902
LowPhrase 576.08 (4.7%) 575.26 (5.2%) -0.1% ( -9% - 10%) 0.928
OrNotHighMed 692.19 (6.3%) 691.63 (5.1%) -0.1% ( -10% - 12%) 0.965
Fuzzy2 30.66 (4.3%) 30.64 (2.9%) -0.1% ( -7% - 7%) 0.949
MedSpanNear 23.95 (2.0%) 23.95 (2.2%) -0.0% ( -4% - 4%) 0.992
MedSloppyPhrase 36.54 (5.7%) 36.57 (4.2%) 0.1% ( -9% - 10%) 0.962
BrowseMonthSSDVFacets 31.13 (5.1%) 31.20 (4.8%) 0.2% ( -9% - 10%) 0.888
HighTermTitleBDVSort 89.16 (8.9%) 89.37 (8.7%) 0.2% ( -15% - 19%) 0.934
Respell 83.91 (3.1%) 84.11 (2.7%) 0.2% ( -5% - 6%) 0.798
OrNotHighLow 1073.98 (6.5%) 1076.76 (6.1%) 0.3% ( -11% - 13%) 0.897
AndHighLow 905.72 (4.4%) 908.07 (5.2%) 0.3% ( -8% - 10%) 0.864
Fuzzy1 81.24 (12.0%) 81.73 (9.6%) 0.6% ( -18% - 25%) 0.861
HighTermDayOfYearSort 292.09 (15.6%) 294.83 (14.1%) 0.9% ( -24% - 36%) 0.842
HighTermMonthSort 71.72 (11.6%) 74.49 (17.1%) 3.9% ( -22% - 36%) 0.403
2 luceneutil runs with wikimedium5m results:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighMed 199.09 (3.5%) 105.27 (2.5%) -47.1% ( -51% - -42%) 0.000
OrHighLow 536.68 (6.0%) 309.93 (3.2%) -42.3% ( -48% - -35%) 0.000
OrHighHigh 34.53 (2.3%) 23.21 (3.8%) -32.8% ( -38% - -27%) 0.000
LowTerm 1830.50 (4.8%) 1784.41 (5.1%) -2.5% ( -11% - 7%) 0.109
HighPhrase 243.73 (2.6%) 237.72 (2.9%) -2.5% ( -7% - 3%) 0.005
OrNotHighLow 1251.92 (3.8%) 1236.67 (4.1%) -1.2% ( -8% - 6%) 0.328
OrHighNotMed 760.26 (4.4%) 751.82 (4.7%) -1.1% ( -9% - 8%) 0.443
HighTerm 1384.99 (4.2%) 1370.00 (4.2%) -1.1% ( -9% - 7%) 0.415
LowPhrase 80.03 (3.2%) 79.19 (3.5%) -1.1% ( -7% - 5%) 0.321
LowSpanNear 194.21 (2.3%) 192.63 (2.4%) -0.8% ( -5% - 4%) 0.281
OrNotHighHigh 687.96 (6.5%) 682.76 (4.9%) -0.8% ( -11% - 11%) 0.679
MedPhrase 435.41 (3.1%) 432.56 (3.8%) -0.7% ( -7% - 6%) 0.554
MedSpanNear 143.42 (3.0%) 142.58 (2.8%) -0.6% ( -6% - 5%) 0.520
HighTermMonthSort 153.33 (13.2%) 152.43 (11.4%) -0.6% ( -22% - 27%) 0.880
HighSpanNear 46.53 (2.7%) 46.31 (2.4%) -0.5% ( -5% - 4%) 0.568
PKLookup 222.89 (3.2%) 221.95 (3.3%) -0.4% ( -6% - 6%) 0.683
TermDTSort 210.61 (15.6%) 209.73 (15.1%) -0.4% ( -26% - 35%) 0.932
HighTermDayOfYearSort 261.28 (8.7%) 260.58 (8.2%) -0.3% ( -15% - 18%) 0.921
MedTerm 1647.86 (3.4%) 1644.52 (5.7%) -0.2% ( -8% - 9%) 0.891
MedSloppyPhrase 30.70 (2.8%) 30.65 (2.4%) -0.2% ( -5% - 5%) 0.838
BrowseMonthSSDVFacets 31.90 (2.1%) 31.86 (2.2%) -0.1% ( -4% - 4%) 0.870
BrowseDayOfYearTaxoFacets 11.13 (2.9%) 11.13 (2.7%) -0.0% ( -5% - 5%) 0.958
LowSloppyPhrase 72.26 (1.9%) 72.26 (1.5%) -0.0% ( -3% - 3%) 0.985
AndHighLow 1093.17 (4.7%) 1093.14 (5.8%) -0.0% ( -10% - 11%) 0.999
BrowseMonthTaxoFacets 13.15 (2.4%) 13.15 (2.6%) 0.0% ( -4% - 5%) 0.995
BrowseDateTaxoFacets 11.10 (2.8%) 11.11 (2.6%) 0.0% ( -5% - 5%) 0.956
Prefix3 435.49 (5.6%) 435.75 (4.6%) 0.1% ( -9% - 10%) 0.971
HighTermTitleBDVSort 349.10 (15.4%) 349.53 (15.9%) 0.1% ( -26% - 37%) 0.980
IntNRQ 196.54 (1.2%) 196.83 (1.4%) 0.1% ( -2% - 2%) 0.726
HighIntervalsOrdered 39.29 (1.7%) 39.35 (1.7%) 0.2% ( -3% - 3%) 0.779
Respell 86.26 (2.8%) 86.41 (2.2%) 0.2% ( -4% - 5%) 0.828
OrNotHighMed 630.88 (5.5%) 632.63 (4.2%) 0.3% ( -9% - 10%) 0.859
Wildcard 219.64 (3.3%) 220.47 (3.0%) 0.4% ( -5% - 6%) 0.700
AndHighMed 446.48 (3.2%) 448.84 (2.5%) 0.5% ( -5% - 6%) 0.566
HighSloppyPhrase 48.95 (5.3%) 49.29 (4.3%) 0.7% ( -8% - 10%) 0.653
Fuzzy1 77.46 (5.8%) 78.08 (6.6%) 0.8% ( -10% - 13%) 0.683
OrHighNotHigh 646.82 (5.2%) 653.41 (5.8%) 1.0% ( -9% - 12%) 0.558
BrowseDayOfYearSSDVFacets 27.71 (6.9%) 28.04 (5.3%) 1.2% ( -10% - 14%) 0.543
AndHighHigh 153.06 (3.3%) 155.03 (4.2%) 1.3% ( -6% - 9%) 0.283
OrHighNotLow 783.18 (5.7%) 802.15 (5.5%) 2.4% ( -8% - 14%) 0.173
Fuzzy2 82.84 (18.2%) 86.90 (16.6%) 4.9% ( -25% - 48%) 0.373
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
OrHighMed 217.30 (3.7%) 124.89 (1.6%) -42.5% ( -46% - -38%) 0.000
OrHighHigh 66.63 (2.2%) 49.45 (1.5%) -25.8% ( -28% - -22%) 0.000
OrHighLow 225.66 (2.9%) 210.48 (4.3%) -6.7% ( -13% - 0%) 0.000
OrNotHighHigh 828.22 (5.6%) 803.12 (5.7%) -3.0% ( -13% - 8%) 0.090
OrHighNotHigh 788.67 (5.7%) 766.95 (5.1%) -2.8% ( -12% - 8%) 0.107
HighTermTitleBDVSort 233.10 (10.2%) 227.07 (8.9%) -2.6% ( -19% - 18%) 0.393
HighTermMonthSort 399.02 (12.3%) 389.61 (13.3%) -2.4% ( -24% - 26%) 0.560
MedTerm 1607.99 (7.4%) 1570.56 (3.6%) -2.3% ( -12% - 9%) 0.206
OrNotHighMed 702.25 (4.6%) 687.85 (5.8%) -2.1% ( -11% - 8%) 0.213
HighTermDayOfYearSort 258.38 (14.9%) 253.13 (13.3%) -2.0% ( -26% - 30%) 0.649
Prefix3 499.62 (5.8%) 490.49 (5.9%) -1.8% ( -12% - 10%) 0.323
LowTerm 1675.66 (4.5%) 1645.62 (4.4%) -1.8% ( -10% - 7%) 0.203
Wildcard 148.88 (2.8%) 146.54 (3.8%) -1.6% ( -7% - 5%) 0.138
OrHighNotLow 752.35 (5.4%) 741.10 (7.0%) -1.5% ( -13% - 11%) 0.446
MedSloppyPhrase 35.95 (3.7%) 35.59 (3.1%) -1.0% ( -7% - 6%) 0.354
MedPhrase 738.72 (5.1%) 731.71 (5.3%) -0.9% ( -10% - 9%) 0.564
LowSloppyPhrase 75.04 (3.4%) 74.41 (3.1%) -0.8% ( -7% - 5%) 0.414
TermDTSort 444.12 (14.6%) 440.54 (14.0%) -0.8% ( -25% - 32%) 0.858
OrNotHighLow 903.20 (3.9%) 898.10 (4.3%) -0.6% ( -8% - 7%) 0.662
PKLookup 217.06 (3.8%) 216.21 (3.6%) -0.4% ( -7% - 7%) 0.739
MedSpanNear 124.58 (3.3%) 124.14 (1.7%) -0.3% ( -5% - 4%) 0.676
HighPhrase 84.33 (2.9%) 84.05 (3.0%) -0.3% ( -6% - 5%) 0.718
LowPhrase 37.84 (3.7%) 37.72 (3.7%) -0.3% ( -7% - 7%) 0.782
Fuzzy2 64.65 (13.7%) 64.47 (11.6%) -0.3% ( -22% - 29%) 0.943
Fuzzy1 57.26 (8.4%) 57.13 (7.4%) -0.2% ( -14% - 16%) 0.930
AndHighHigh 126.99 (2.2%) 126.72 (3.0%) -0.2% ( -5% - 5%) 0.804
HighIntervalsOrdered 13.60 (2.8%) 13.59 (2.5%) -0.1% ( -5% - 5%) 0.916
Respell 102.67 (2.9%) 102.59 (2.4%) -0.1% ( -5% - 5%) 0.930
LowSpanNear 63.07 (3.0%) 63.16 (2.3%) 0.2% ( -4% - 5%) 0.857
HighSloppyPhrase 91.17 (3.5%) 91.31 (3.3%) 0.2% ( -6% - 7%) 0.887
AndHighMed 284.25 (2.0%) 285.44 (2.6%) 0.4% ( -4% - 5%) 0.568
HighTerm 1258.74 (4.9%) 1264.45 (3.7%) 0.5% ( -7% - 9%) 0.741
OrHighNotMed 699.99 (6.1%) 703.29 (7.3%) 0.5% ( -12% - 14%) 0.824
BrowseDayOfYearTaxoFacets 10.94 (3.8%) 11.00 (3.5%) 0.5% ( -6% - 8%) 0.653
BrowseMonthTaxoFacets 13.06 (2.4%) 13.13 (2.6%) 0.5% ( -4% - 5%) 0.505
BrowseDateTaxoFacets 10.92 (3.8%) 10.98 (3.6%) 0.6% ( -6% - 8%) 0.625
AndHighLow 1026.60 (4.0%) 1032.70 (4.5%) 0.6% ( -7% - 9%) 0.659
BrowseMonthSSDVFacets 31.41 (5.4%) 31.63 (2.4%) 0.7% ( -6% - 9%) 0.595
IntNRQ 293.66 (3.3%) 295.77 (2.0%) 0.7% ( -4% - 6%) 0.407
HighSpanNear 147.03 (3.1%) 148.42 (1.8%) 0.9% ( -3% - 6%) 0.243
BrowseDayOfYearSSDVFacets 27.66 (5.2%) 28.13 (1.7%) 1.7% ( -4% - 9%) 0.163
I played with your branch in order to try to speed things up a bit and I'm getting a bit closer to our current WAND implementation: https://github.com/jpountz/lucene/tree/LUCENE-9335-BMMScorer. Hopefully it's not too buggy. On the latest benchmark I ran it was 15% faster than WANScorer on your OrMedMedMedMedMed task.
Oh wow thanks Adrien for trying it out! It's quite a speed up from -20 ~ 40% to +15%!! I also run it with the full benchmark queries and see it's making + 20% - 90% speed up for OrHighHigh
, with the cost of -10 ~ 20% slow down for OrHighLow
and OrHighMed
, but it seems like a good net speedup given the latter twos already have much faster QPS compared to OrHighHigh
?
I took a quick look at your implementation and will study it more in the next few days, but right now I can see a few optimizations you applied and suggested earlier that can explain the speed up:
- The use of faster methods on the priority queue
updateTop
,topList
etc - The use of maxScore instead of
score()
to pick competitive doc from the simplified iterator - Pushing more scorers into
nonEssentialScorers
inrepartitionLists
The last two are optimization techniques not mentioned in the paper I think?
By the way, please feel free to push directly into this PR branch as well, as I think your changes are in pretty good shape?
The last two are optimization techniques not mentioned in the paper I think?
To be honest I didn't read the paper recently so it's possible I diverged a bit from it.
(2) feels natural to avoid doing useless score computations, though it might only work well when score upper bounds are very close to the actual scores. Maybe we should test on wikibig instead of wikimedium to get better confidence that this change makes things better.
Regarding (3) does it actually push more scorers into nonEssentialScorers
? I thought I just reorganized the existing logic a bit. If it pushes more scorers into nonEssentialScorers
it's probably a bug. :)
but it seems like a good net speedup given the latter twos already have much faster QPS compared to OrHighHigh?
Agreed, we already made similar choices in the past. It's probably still worth playing with a wider variety of queries, e.g. queries with many terms that have mixed frequencies, e.g. something like OrHighHighMedMedLowLow, or disjunctions within conjunctions/conjunctions within disjunctions.
Maybe we can also try to port similar changes to the bulk scorer to see if it yields even greater benefits?
I cherry-picked your commit and pushed to this branch / PR to further explore the changes and their effect, hope that's ok.
(2) feels natural to avoid doing useless score computations, though it might only work well when score upper bounds are very close to the actual scores. Maybe we should test on wikibig instead of wikimedium to get better confidence that this change makes things better.
Regarding (3) does it actually push more scorers into nonEssentialScorers? I thought I just reorganized the existing logic a bit. If it pushes more scorers into nonEssentialScorers it's probably a bug. :)
I did some explorations to see how these two affect QPS by removing them here https://github.com/apache/lucene/pull/101/commits/2835055979fa6a739972d2abf30888e855b7683c, and restore the optimizations in later commits (the luceneutil results from OrMedMedMedMedMed for these changes are added into git commit messages). From a few runs it seems that these two accounted for about 15% improvement (from -7% to +7%), although for (3) after some more thought I think the different implementations should actually be the same, since the scorers were already sorted by maxScore before they were partitioned. The changes to iterator to avoid unneeded score computation should probably account for the rest of the improvement.
I also tried to run wikibigall
as well, which seems to require enwiki-20100302-pages-articles-lines.txt
but it's not downloaded by the util. It appears the archive should be coming from http://home.apache.org/~mikemccand/enwiki-20100302-pages-articles-lines.txt.bz2, but it's giving 404 now. Is there a new place for this archive to live now that I can download from?
Maybe we can also try to port similar changes to the bulk scorer to see if it yields even greater benefits?
Yes I would love to do that! Will work on that next and try out more different queries.
I cherry-picked your commit and pushed to this branch / PR to further explore the changes and their effect, hope that's ok.
Of course!
I also tried to run wikibigall as well, which seems to require enwiki-20100302-pages-articles-lines.txt but it's not downloaded by the util. It appears the archive should be coming from http://home.apache.org/~mikemccand/enwiki-20100302-pages-articles-lines.txt.bz2, but it's giving 404 now.
Hmm good question, I had downloaded this file years ago, I'm not sure where to find it nowadays. @mikemccand Do you know where to find it? Otherwise I'll upload mine somewhere.
I also tried to run wikibigall as well, which seems to require enwiki-20100302-pages-articles-lines.txt but it's not downloaded by the util. It appears the archive should be coming from http://home.apache.org/~mikemccand/enwiki-20100302-pages-articles-lines.txt.bz2, but it's giving 404 now.
Hmm good question, I had downloaded this file years ago, I'm not sure where to find it nowadays. @mikemccand Do you know where to find it? Otherwise I'll upload mine somewhere.
Egads, it is indeed missing! I will re-upload it. Hmm, I do not seem to have that exact file locally cached. Confusing ;) I have the -1kb
version (medium sized docs), but not the big
docs. @jpountz could you please post somewhere, maybe your home.apache.org/~jpountz
, using sftp
? Then I'll download and copy it up to /~mikemccand
.
The nightly benchmarks uses the binary form of wikibigall
, to reduce thread bottleneck when reading/parsing documents to index. Hmm it is sampled from a different date (01/15/2011) ... OK I am uploading that one to https://home.apache.org/~mikemccand/enwiki-20110115-lines.bin (ETA ~20 minutes more).
BTW there is a luceneutil
issue to re-sample the Wikipedia export, but, alas, it is snagged up because the latest enwiki
download, after converting XML -> text, is SMALLER than the sample we pulled 11 years ago! Which I could not yet explain ...
@mikemccand I'll try to do it overnight as I have a terrible uplink. FWIW the file I have locally is enwiki-20130102-lines.txt
, not the enwiki-20100302-pages-articles-lines.txt
file that luceneutil refers to.
FWIW the file I have locally is enwiki-20130102-lines.txt, not the enwiki-20100302-pages-articles-lines.txt file that luceneutil refers to.
Aha! I have that one locally, compressed with .lzma
-- I'll upload it.
OK it's copying now to https://home.apache.org/~mikemccand/enwiki-20130102-lines.txt.lzma! ETA ~15 minutes.
Wonderful, thanks @mikemccand !
Thanks @mikemccand and @jpountz for the uploads!
The nightly benchmarks uses the binary form of wikibigall, to reduce thread bottleneck when reading/parsing documents to index. Hmm it is sampled from a different date (01/15/2011) ... OK I am uploading that one to https://home.apache.org/~mikemccand/enwiki-20110115-lines.bin (ETA ~20 minutes more).
I'm assuming we should now run nightlyBench.py
instead of wikibigall
, since the latter depends on enwiki-20100302-pages-articles-lines.txt
but it's no longer available?
Thanks @mikemccand and @jpountz for the uploads!
The nightly benchmarks uses the binary form of wikibigall, to reduce thread bottleneck when reading/parsing documents to index. Hmm it is sampled from a different date (01/15/2011) ... OK I am uploading that one to https://home.apache.org/~mikemccand/enwiki-20110115-lines.bin (ETA ~20 minutes more).
I'm assuming we should now run
nightlyBench.py
instead ofwikibigall
, since the latter depends onenwiki-20100302-pages-articles-lines.txt
but it's no longer available?
Actually, I'm still not quite sure how to run it with enwiki-20110115-lines.bin
, since it's not referred to in luceneutil code I think? I'll dig around in nightly benchmark job tomorrow to find out.
Actually you don't need nightlyBench.py, you can use the standard python script. I think the following should work to test out on larger documents:
- Download https://home.apache.org/~mikemccand/enwiki-20130102-lines.txt.lzma and put it under the same directory as the other data files like
enwiki-20120502-lines-1k.txt
. - Decompress it with
unlzma enwiki-20130102-lines.txt.lzma
. - Add the following to your
localconstants.py
:
WIKI_BIG_DOCS_LINE_FILE = '%s/data/enwiki-20130102-lines.txt' % BASE_DIR
WIKI_BIG_DOCS_COUNT = 6647577
WIKI_BIG_TASKS_FILE = '%s/tasks/wikinightly.tasks' % BENCH_BASE_DIR
- Run the benchmark with
-source wikibig10k
to make sure everything works. - Finally run the benchmark with
-source wikibigall
.
Thanks Adrien for the suggestion! Let me give it a try.
I'm able to run wikibigall with the above configuration (and a few small changes to localrun.py
to take in vector dictionary and file), and get the following results:
wikibigall run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
AndMedOrHighHigh 25.47 (2.5%) 21.21 (4.1%) -16.7% ( -22% - -10%) 0.000
OrHighMed 72.05 (2.5%) 60.17 (4.1%) -16.5% ( -22% - -10%) 0.000
Fuzzy2 31.51 (8.4%) 29.27 (9.5%) -7.1% ( -23% - 11%) 0.012
Fuzzy1 59.00 (8.4%) 56.38 (9.8%) -4.4% ( -20% - 14%) 0.122
TermDayOfYearSort 48.68 (14.6%) 46.86 (8.9%) -3.7% ( -23% - 23%) 0.328
TermMonthSort 61.44 (9.3%) 60.23 (9.8%) -2.0% ( -19% - 18%) 0.514
TermTitleSort 110.13 (8.9%) 108.54 (9.2%) -1.4% ( -18% - 18%) 0.616
TermDTSort 56.74 (9.5%) 56.04 (10.2%) -1.2% ( -19% - 20%) 0.692
TermGroup10K 15.95 (2.6%) 15.75 (3.2%) -1.2% ( -6% - 4%) 0.183
TermGroup100 19.34 (3.5%) 19.11 (3.4%) -1.2% ( -7% - 5%) 0.276
TermBGroup1M 17.92 (2.7%) 17.72 (3.8%) -1.1% ( -7% - 5%) 0.280
TermGroup1M 19.06 (2.5%) 18.86 (3.1%) -1.1% ( -6% - 4%) 0.234
TermBGroup1M1P 43.93 (3.8%) 43.66 (4.4%) -0.6% ( -8% - 7%) 0.633
SpanNear 24.55 (2.5%) 24.46 (2.4%) -0.4% ( -5% - 4%) 0.620
SloppyPhrase 2.99 (8.2%) 2.98 (8.0%) -0.3% ( -15% - 17%) 0.897
IntNRQ 139.92 (1.2%) 139.61 (1.7%) -0.2% ( -3% - 2%) 0.627
IntervalsOrdered 2.02 (3.7%) 2.02 (3.6%) -0.1% ( -7% - 7%) 0.957
BrowseMonthSSDVFacets 19.28 (1.0%) 19.27 (0.9%) -0.0% ( -1% - 1%) 0.888
BrowseDayOfYearSSDVFacets 17.61 (1.9%) 17.61 (1.8%) 0.0% ( -3% - 3%) 0.989
Respell 48.40 (2.6%) 48.59 (3.1%) 0.4% ( -5% - 6%) 0.660
PKLookup 211.17 (3.4%) 212.28 (3.6%) 0.5% ( -6% - 7%) 0.631
BrowseDayOfYearTaxoFacets 7.30 (6.4%) 7.34 (5.8%) 0.6% ( -10% - 13%) 0.766
Prefix3 120.15 (6.1%) 120.95 (4.6%) 0.7% ( -9% - 12%) 0.694
Phrase 54.76 (3.6%) 55.13 (4.3%) 0.7% ( -6% - 8%) 0.586
BrowseDateTaxoFacets 7.62 (6.7%) 7.67 (6.1%) 0.7% ( -11% - 14%) 0.718
Wildcard 72.89 (2.8%) 73.46 (2.2%) 0.8% ( -4% - 5%) 0.331
BrowseMonthTaxoFacets 8.42 (7.0%) 8.49 (6.4%) 0.8% ( -11% - 15%) 0.694
TermDateFacets 8.56 (7.4%) 8.64 (7.3%) 0.9% ( -12% - 16%) 0.702
VectorSearch 954.06 (3.7%) 967.28 (2.5%) 1.4% ( -4% - 7%) 0.167
AndHighHigh 32.04 (3.1%) 32.65 (4.1%) 1.9% ( -5% - 9%) 0.100
AndHighMed 83.72 (1.8%) 85.38 (2.9%) 2.0% ( -2% - 6%) 0.009
AndHighOrMedMed 44.41 (2.2%) 45.34 (4.2%) 2.1% ( -4% - 8%) 0.049
Term 1132.83 (5.2%) 1157.74 (5.0%) 2.2% ( -7% - 13%) 0.173
OrHighHigh 11.35 (2.9%) 16.95 (4.1%) 49.4% ( 41% - 58%) 0.000
wikibigall run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
TermDayOfYearSort 59.31 (11.3%) 56.25 (9.5%) -5.2% ( -23% - 17%) 0.116
TermTitleSort 175.57 (10.3%) 169.31 (9.3%) -3.6% ( -21% - 17%) 0.252
TermMonthSort 61.73 (10.5%) 59.73 (9.2%) -3.2% ( -20% - 18%) 0.300
TermBGroup1M 11.87 (3.2%) 11.57 (2.6%) -2.6% ( -8% - 3%) 0.005
TermGroup10K 15.60 (2.8%) 15.26 (3.3%) -2.2% ( -8% - 4%) 0.023
TermGroup100 23.39 (2.6%) 22.94 (3.0%) -1.9% ( -7% - 3%) 0.030
TermGroup1M 12.16 (2.6%) 11.94 (2.9%) -1.8% ( -7% - 3%) 0.036
TermDTSort 78.08 (6.4%) 76.75 (5.4%) -1.7% ( -12% - 10%) 0.361
Fuzzy1 28.49 (5.6%) 28.08 (5.8%) -1.4% ( -12% - 10%) 0.419
TermBGroup1M1P 42.98 (2.9%) 42.49 (3.3%) -1.1% ( -7% - 5%) 0.247
IntNRQ 136.00 (3.4%) 134.92 (3.5%) -0.8% ( -7% - 6%) 0.473
AndHighHigh 22.60 (3.9%) 22.42 (2.4%) -0.8% ( -6% - 5%) 0.455
TermDateFacets 8.38 (7.0%) 8.32 (6.6%) -0.7% ( -13% - 13%) 0.738
Prefix3 33.76 (3.1%) 33.55 (2.8%) -0.6% ( -6% - 5%) 0.502
Wildcard 88.25 (3.0%) 87.72 (2.3%) -0.6% ( -5% - 4%) 0.479
BrowseDayOfYearSSDVFacets 17.39 (1.4%) 17.31 (1.5%) -0.4% ( -3% - 2%) 0.363
Term 1144.71 (8.2%) 1139.98 (8.5%) -0.4% ( -15% - 17%) 0.876
VectorSearch 807.48 (4.7%) 804.27 (4.9%) -0.4% ( -9% - 9%) 0.794
IntervalsOrdered 0.82 (5.3%) 0.82 (5.4%) -0.3% ( -10% - 11%) 0.872
PKLookup 207.38 (4.8%) 206.99 (5.5%) -0.2% ( -10% - 10%) 0.910
Respell 48.27 (3.6%) 48.18 (3.6%) -0.2% ( -7% - 7%) 0.878
BrowseDateTaxoFacets 7.58 (6.8%) 7.57 (6.5%) -0.2% ( -12% - 14%) 0.933
BrowseDayOfYearTaxoFacets 7.26 (6.3%) 7.24 (6.2%) -0.2% ( -11% - 13%) 0.932
SloppyPhrase 5.55 (5.7%) 5.54 (5.8%) -0.1% ( -10% - 12%) 0.946
BrowseMonthTaxoFacets 8.38 (6.7%) 8.38 (6.6%) -0.1% ( -12% - 14%) 0.969
SpanNear 2.12 (3.5%) 2.12 (3.9%) 0.0% ( -7% - 7%) 0.993
BrowseMonthSSDVFacets 19.52 (5.7%) 19.53 (5.7%) 0.0% ( -10% - 12%) 0.985
Phrase 59.48 (3.8%) 59.72 (3.9%) 0.4% ( -6% - 8%) 0.735
AndHighMed 60.30 (4.6%) 60.57 (2.3%) 0.4% ( -6% - 7%) 0.705
Fuzzy2 53.28 (11.3%) 53.93 (10.6%) 1.2% ( -18% - 26%) 0.727
AndMedOrHighHigh 27.58 (3.1%) 27.93 (3.4%) 1.3% ( -5% - 8%) 0.209
AndHighOrMedMed 24.38 (3.1%) 25.06 (3.3%) 2.8% ( -3% - 9%) 0.006
OrHighMed 45.08 (4.3%) 57.44 (6.9%) 27.4% ( 15% - 40%) 0.000
OrHighHigh 11.22 (4.8%) 16.16 (9.8%) 44.0% ( 28% - 61%) 0.000
wikibigall run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
Fuzzy2 51.26 (10.6%) 47.91 (11.4%) -6.5% ( -25% - 17%) 0.060
TermDTSort 83.68 (11.6%) 79.69 (7.7%) -4.8% ( -21% - 16%) 0.126
Fuzzy1 60.65 (6.6%) 57.96 (8.5%) -4.4% ( -18% - 11%) 0.065
TermMonthSort 61.55 (15.5%) 59.70 (8.4%) -3.0% ( -23% - 24%) 0.447
TermTitleSort 88.13 (15.1%) 85.68 (8.4%) -2.8% ( -22% - 24%) 0.471
TermDayOfYearSort 47.32 (8.7%) 46.58 (10.0%) -1.5% ( -18% - 18%) 0.602
Wildcard 37.71 (9.9%) 37.32 (9.1%) -1.0% ( -18% - 19%) 0.729
Prefix3 164.10 (12.8%) 162.91 (11.5%) -0.7% ( -22% - 27%) 0.850
TermGroup100 23.62 (3.0%) 23.50 (2.9%) -0.5% ( -6% - 5%) 0.617
TermBGroup1M 11.90 (2.7%) 11.86 (3.1%) -0.3% ( -5% - 5%) 0.714
Term 1044.58 (8.0%) 1041.84 (6.5%) -0.3% ( -13% - 15%) 0.910
TermGroup10K 12.18 (2.9%) 12.15 (2.7%) -0.2% ( -5% - 5%) 0.808
IntervalsOrdered 3.91 (2.7%) 3.91 (2.7%) -0.1% ( -5% - 5%) 0.934
BrowseDayOfYearSSDVFacets 17.47 (2.3%) 17.47 (2.0%) -0.0% ( -4% - 4%) 0.994
VectorSearch 851.38 (5.2%) 851.53 (5.6%) 0.0% ( -10% - 11%) 0.992
SpanNear 24.59 (2.2%) 24.61 (2.3%) 0.1% ( -4% - 4%) 0.923
TermGroup1M 15.44 (2.7%) 15.45 (2.9%) 0.1% ( -5% - 5%) 0.936
BrowseMonthSSDVFacets 19.19 (1.4%) 19.22 (1.5%) 0.2% ( -2% - 3%) 0.746
TermBGroup1M1P 18.19 (4.7%) 18.24 (4.6%) 0.3% ( -8% - 9%) 0.858
IntNRQ 136.29 (3.4%) 136.75 (3.5%) 0.3% ( -6% - 7%) 0.755
Respell 40.87 (5.4%) 41.04 (4.8%) 0.4% ( -9% - 11%) 0.793
Phrase 24.71 (3.1%) 24.87 (2.7%) 0.7% ( -4% - 6%) 0.472
PKLookup 205.99 (5.9%) 208.11 (6.4%) 1.0% ( -10% - 14%) 0.596
BrowseDayOfYearTaxoFacets 7.04 (7.8%) 7.11 (7.5%) 1.0% ( -13% - 17%) 0.665
BrowseDateTaxoFacets 7.36 (7.9%) 7.45 (7.7%) 1.1% ( -13% - 18%) 0.646
SloppyPhrase 1.17 (10.0%) 1.19 (9.8%) 1.2% ( -16% - 23%) 0.705
AndHighMed 45.25 (3.5%) 45.82 (3.8%) 1.3% ( -5% - 8%) 0.274
BrowseMonthTaxoFacets 8.13 (7.9%) 8.25 (7.9%) 1.4% ( -13% - 18%) 0.569
AndHighHigh 23.27 (3.6%) 23.60 (4.0%) 1.4% ( -5% - 9%) 0.227
TermDateFacets 11.08 (9.9%) 11.27 (10.0%) 1.7% ( -16% - 23%) 0.582
AndHighOrMedMed 24.91 (3.8%) 25.72 (4.0%) 3.3% ( -4% - 11%) 0.008
AndMedOrHighHigh 10.00 (4.2%) 10.61 (4.2%) 6.1% ( -2% - 15%) 0.000
OrHighMed 59.17 (5.5%) 67.57 (4.5%) 14.2% ( 3% - 25%) 0.000
OrHighHigh 17.06 (4.9%) 24.26 (5.2%) 42.2% ( 30% - 55%) 0.000
I'll run it for the other two implementations as well and post the results in the other PR.
Hi @jpountz, upon further reviewing the result above, I found one issue for this particular scorer implementation (but not for the other BulkScorer based implementations). Currently this BMM scorer would also be used for nested boolean queries such as +body:interview +(body:at body:united)
(AndMedOrHighHigh), but in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed? I took a quick look at the code and it seems the query and weight interfaces do not provide an API to tell whether they are top-level or not currently, so this may involve some changes in BooleanWeight
code I think.
On the other hand, from the results above it looks like 2 out of 3 instances, having BMM scorer to run for nested queries doesn’t hurt its performance (the AndMedOrHighHigh
one). So I’m wondering maybe it’s ok to leave it as is, as BMM may still give a boost for the pure disjunction part of the query? The one result that does show negative impact to AndMedOrHighHigh
also shows impact to OrHighMed
, so it’s a bit strange and may need further looking into to see the cause.
in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed?
I opened this JIRA ticket because it felt like we could do better for top-level disjunctions, but if BMM appears to work better most of the time, we could just move to it all the time.
The one result that does show negative impact to AndMedOrHighHigh also shows impact to OrHighMed, so it’s a bit strange and may need further looking into to see the cause.
Yeah, I suspect there will always be cases when BMW will perform better than BMM or vice-versa, sometimes for subtle reasons.
in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed?
I opened this JIRA ticket because it felt like we could do better for top-level disjunctions, but if BMM appears to work better most of the time, we could just move to it all the time.
The one result that does show negative impact to AndMedOrHighHigh also shows impact to OrHighMed, so it’s a bit strange and may need further looking into to see the cause.
Yeah, I suspect there will always be cases when BMW will perform better than BMM or vice-versa, sometimes for subtle reasons.
Makes sense! I'll not attempt to fix it for now then.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!