Description

This PR is WIP for implementing basic dynamic pruning support for CombinedFieldsQuery

#11099

Tests

Added a randomized test to compare and verify top 100 results match between top_score (with pruning) and complete scoring.

Oct 29 '21 05:10 zacharymorn

Perf tests result for commit 2ba435e5c83f870be9566 Run 1: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.53 (3.3%) 2.92 (5.3%) -17.2% ( -25% - -8%) 0.000 PKLookup 108.13 (7.7%) 119.85 (8.2%) 10.8% ( -4% - 28%) 0.000 CFQHighLow 14.88 (3.9%) 16.69 (12.5%) 12.2% ( -3% - 29%) 0.000 CFQHighMed 21.11 (4.1%) 25.87 (11.8%) 22.6% ( 6% - 40%) 0.000

Run 2: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 6.64 (3.1%) 5.63 (10.2%) -15.2% ( -27% - -1%) 0.000 CFQHighLow 8.35 (2.8%) 8.05 (15.0%) -3.6% ( -20% - 14%) 0.297 CFQHighMed 24.51 (5.3%) 24.90 (19.9%) 1.6% ( -22% - 28%) 0.733 PKLookup 110.06 (10.0%) 128.54 (7.9%) 16.8% ( -1% - 38%) 0.000

Run 3: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 13.01 (2.9%) 9.82 (7.8%) -24.5% ( -34% - -14%) 0.000 PKLookup 107.85 (8.1%) 111.79 (11.2%) 3.7% ( -14% - 24%) 0.236 CFQHighHigh 4.83 (2.6%) 5.06 (8.6%) 4.7% ( -6% - 16%) 0.018 CFQHighLow 14.95 (3.0%) 18.31 (19.0%) 22.5% ( 0% - 45%) 0.000

Run 4: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 11.11 (2.9%) 6.69 (4.1%) -39.7% ( -45% - -33%) 0.000 CFQHighLow 27.55 (3.8%) 25.46 (11.0%) -7.6% ( -21% - 7%) 0.003 CFQHighHigh 5.25 (3.2%) 4.96 (6.1%) -5.7% ( -14% - 3%) 0.000 PKLookup 107.61 (6.7%) 121.19 (4.6%) 12.6% ( 1% - 25%) 0.000

Nov 04 '21 06:11 zacharymorn

Perf test result from 1a71469bf7afa3

Run 1
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        4.64      (6.5%)        3.30      (4.7%)  -29.0% ( -37% -  -19%) 0.000
                     CFQHighHigh       11.09      (6.0%)        9.61      (6.0%)  -13.3% ( -23% -   -1%) 0.000
                        PKLookup      103.38      (4.4%)      108.04      (4.3%)    4.5% (  -4% -   13%) 0.001
                   CFQHighMedLow       10.58      (6.1%)       12.30      (8.7%)   16.2% (   1% -   33%) 0.000
                      CFQHighMed       10.70      (7.4%)       15.51     (11.2%)   44.9% (  24% -   68%) 0.000
                   CFQHighLowLow        8.18      (8.2%)       12.87     (11.6%)   57.3% (  34% -   84%) 0.000
                      CFQHighLow       14.57      (7.5%)       30.81     (15.1%)  111.4% (  82% -  144%) 0.000

Run 2
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.33      (5.7%)        4.02      (7.7%)  -24.4% ( -35% -  -11%) 0.000
                   CFQHighLowLow       17.14      (6.2%)       13.06      (5.4%)  -23.8% ( -33% -  -13%) 0.000
                      CFQHighMed       17.37      (5.8%)       14.38      (7.7%)  -17.2% ( -29% -   -3%) 0.000
                        PKLookup      103.57      (5.5%)      108.84      (5.9%)    5.1% (  -6% -   17%) 0.005
                   CFQHighMedLow       11.25      (7.2%)       12.70      (9.0%)   12.9% (  -3% -   31%) 0.000
                     CFQHighHigh        5.00      (6.2%)        7.54     (12.1%)   51.0% (  30% -   73%) 0.000
                      CFQHighLow       21.60      (5.2%)       34.57     (14.1%)   60.0% (  38% -   83%) 0.000

Run 3
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 CFQHighHighHigh        5.40      (6.9%)        4.06      (5.1%)  -24.8% ( -34% -  -13%) 0.000
                   CFQHighMedLow        7.64      (7.4%)        5.79      (6.3%)  -24.2% ( -35% -  -11%) 0.000
                     CFQHighHigh       11.11      (7.0%)        9.60      (5.9%)  -13.6% ( -24% -    0%) 0.000
                   CFQHighLowLow       21.21      (7.6%)       21.22      (6.6%)    0.0% ( -13% -   15%) 0.993
                        PKLookup      103.15      (5.9%)      107.60      (6.9%)    4.3% (  -8% -   18%) 0.034
                      CFQHighLow       21.85      (8.1%)       34.18     (13.5%)   56.4% (  32% -   84%) 0.000
                      CFQHighMed       12.07      (8.4%)       19.98     (16.7%)   65.5% (  37% -   98%) 0.000

Run 4
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     CFQHighHigh        8.50      (5.8%)        6.85      (5.2%)  -19.5% ( -28% -   -8%) 0.000
                   CFQHighMedLow       10.89      (5.7%)        8.96      (5.4%)  -17.8% ( -27% -   -7%) 0.000
                      CFQHighMed        8.41      (5.8%)        7.74      (5.6%)   -7.9% ( -18% -    3%) 0.000
                 CFQHighHighHigh        3.45      (6.7%)        3.38      (5.3%)   -2.0% ( -13% -   10%) 0.287
                   CFQHighLowLow        7.82      (6.4%)        8.20      (7.5%)    4.8% (  -8% -   20%) 0.030
                        PKLookup      103.50      (5.0%)      110.69      (5.4%)    6.9% (  -3% -   18%) 0.000
                      CFQHighLow       11.46      (6.0%)       13.16      (6.7%)   14.8% (   1% -   29%) 0.000

Nov 05 '21 04:11 zacharymorn

I re-ran perf test after https://github.com/mikemccand/luceneutil/commit/0550148b67f82d446e07bd0b4fdbde24f1d6228d has been merged:

Results from python3 src/python/localrun.py -source combinedFieldsBig:

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.69      (1.8%)        2.49      (6.2%)  -32.5% ( -39% -  -24%) 0.000
      CFQHighMed        4.95      (2.1%)        4.19      (5.9%)  -15.5% ( -22% -   -7%) 0.000
        PKLookup      125.72      (4.5%)      126.86     (10.3%)    0.9% ( -13% -   16%) 0.719
      CFQHighLow       19.92      (2.2%)       20.80      (9.5%)    4.4% (  -7% -   16%) 0.043

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.61      (2.8%)        2.48      (2.9%)  -31.4% ( -36% -  -26%) 0.000
        PKLookup      116.67      (7.1%)      123.97      (5.5%)    6.3% (  -5% -   20%) 0.002
      CFQHighMed        4.97      (3.6%)        5.29      (5.5%)    6.6% (  -2% -   16%) 0.000
      CFQHighLow       11.96      (4.5%)       13.99      (6.5%)   17.0% (   5% -   29%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.51      (4.2%)        2.44      (6.5%)  -30.5% ( -39% -  -20%) 0.000
        PKLookup      105.72     (11.9%)      108.81     (11.2%)    2.9% ( -18% -   29%) 0.424
      CFQHighMed       10.85      (4.2%)       11.60     (11.4%)    6.9% (  -8% -   23%) 0.011
      CFQHighLow       15.11      (5.6%)       16.16      (9.8%)    7.0% (  -7% -   23%) 0.006

Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig

Run 1:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
        PKLookup       93.42     (13.7%)       88.23     (11.7%)   -5.6% ( -27% -   23%) 0.168
      CFQHighMed        4.69     (10.7%)        5.00     (18.0%)    6.6% ( -20% -   39%) 0.160
     CFQHighHigh        4.51     (10.6%)        5.17     (17.7%)   14.6% ( -12% -   48%) 0.002
      CFQHighLow       14.13      (8.5%)       23.11     (32.3%)   63.5% (  20% -  114%) 0.000

Run 2:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
      CFQHighMed        4.77      (4.5%)        4.10      (8.3%)  -14.2% ( -25% -   -1%) 0.000
        PKLookup       98.99     (12.3%)      101.47     (12.5%)    2.5% ( -19% -   31%) 0.522
     CFQHighHigh        4.88      (5.3%)        5.98     (11.5%)   22.6% (   5% -   41%) 0.000
      CFQHighLow       11.57      (5.6%)       18.86     (18.8%)   62.9% (  36% -   92%) 0.000

Run 3:

            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value 
     CFQHighHigh        3.55      (5.1%)        2.38      (9.0%)  -32.9% ( -44% -  -19%) 0.000
        PKLookup      101.29      (7.0%)       94.22     (15.4%)   -7.0% ( -27% -   16%) 0.065
      CFQHighLow       15.43      (5.8%)       16.60     (11.2%)    7.6% (  -8% -   26%) 0.007
      CFQHighMed        3.12      (5.1%)        3.83     (15.0%)   22.7% (   2% -   45%) 0.000

For one of the most negatively impacted query (-42.0%): CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, the JFR CPU profiling result looks like the following

PERCENT       CPU SAMPLES   STACK
15.82%        13099         org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField()
11.46%        9487          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
4.69%         3883          org.apache.lucene.search.DisiPriorityQueue#downHeap()
3.66%         3027          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.93%         1598          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
1.92%         1590          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
1.92%         1588          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
1.77%         1467          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
1.68%         1392          org.apache.lucene.search.DisiPriorityQueue#top()
1.60%         1326          org.apache.lucene.search.DisiPriorityQueue#topList()
1.54%         1276          org.apache.lucene.util.PriorityQueue#downHeap()
1.50%         1243          java.lang.Math#round()
1.45%         1201          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.38%         1145          java.util.HashMap#resize()
1.35%         1115          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.33%         1100          org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#mergeImpactsPerField()
1.21%         1001          java.util.HashMap$HashIterator#nextNode()
1.17%         972           java.util.LinkedList#linkLast()
1.13%         934           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.07%         883           org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.06%         878           org.apache.lucene.store.ByteBufferGuard#getByte()
1.02%         841           org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()

suggesting quite some CPU time is spent on merging impacts within the same field. I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score:

https://github.com/apache/lucene/blob/2a9adb81df314ffeb92951bbf2d99fecc94fa581/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L78-L82

Nov 12 '21 06:11 zacharymorn

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer: https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421

CombinedFieldQuery#explain: https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields: https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

Nov 13 '21 06:11 zacharymorn

Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:

CombinedFieldQuery#scorer:

https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421

CombinedFieldQuery#explain:

https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389

For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:

https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414

, whereas fieldAndWeights.values() should not contain duplicated fields.

I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?

For now I have created a spin-off issue https://issues.apache.org/jira/browse/LUCENE-10236 and a separate PR https://github.com/apache/lucene/pull/444 for this. Please let me know if they look good to you.

Nov 16 '21 04:11 zacharymorn

Thanks for making progress on this and sorry for the slow review cycles.

No worry and thanks for the review and feedback!

I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?

Yup I can move it out to avoid repeated computation there. Just a quick check though, from the perf results so far (https://github.com/apache/lucene/pull/418#discussion_r754091056 & https://github.com/apache/lucene/pull/418#discussion_r759006970), it seems using the most weighted field to decide on the leading impacts may not yield the best result, as matches in most weighted field may also give a high docId upTo bound. If we were to drop that approach and revert the commit https://github.com/apache/lucene/pull/418/commits/db2446f0337e5860bb8c731df1f6c16281efb6bb that introduced it, then it won't be computing most weighted field at all. Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.

Yeah I have the same thought there. I wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

Dec 01 '21 06:12 zacharymorn

from the perf results so far, it seems using the most weighted field to decide on the leading impacts may not yield the best result Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

Dec 01 '21 18:12 jpountz

Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.

Ah yes using @lucene.internal makes sense! I have made a new class to extract out the impacts merging within same field logic in https://github.com/apache/lucene/pull/418/commits/0a9bdccf01d568b0239cf894ff626e47d8333df3, and use that in CombinedFieldsQuery https://github.com/apache/lucene/pull/418/commits/808fec28ab6d7179974665e7fa069270845811fa. I've also moved up the repeated max weight field computation in https://github.com/apache/lucene/pull/418/commits/8a7ea9998b189649ccdc75ecdd4b096460abb0af .

Dec 02 '21 08:12 zacharymorn

Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?

I see. The latest two commits https://github.com/apache/lucene/pull/418/commits/808fec28ab6d7179974665e7fa069270845811fa & https://github.com/apache/lucene/pull/418/commits/75c5b046f8af4e73341a26a1828e97e006115a6b used max weight field to drive scoring, and overall they had around -20% impacts to tasks from combinedFieldsUnevenlyWeightedBig so far. A sample CPU JFR results between candidate and baseline look like the following:

Candidate CPU JFR:

PERCENT       CPU SAMPLES   STACK
15.13%        13191         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.42%         6466          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
7.29%         6354          org.apache.lucene.search.DisjunctionDISIApproximation#advance()
6.79%         5921          org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.34%         3785          org.apache.lucene.search.DisiPriorityQueue#topList()
3.97%         3465          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.85%         3360          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.18%         2777          java.lang.Math#round()
3.01%         2624          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.01%         2624          org.apache.lucene.search.DisiPriorityQueue#top()
2.33%         2036          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.20%         1917          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.07%         1809          org.apache.lucene.util.SmallFloat#longToInt4()
1.81%         1580          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.78%         1555          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.77%         1545          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.51%         1316          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.39%         1210          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.29%         1124          org.apache.lucene.search.ImpactsDISI#docID()
1.22%         1063          jdk.internal.misc.Unsafe#getByte()
1.19%         1037          org.apache.lucene.store.ByteBufferGuard#getByte()
1.08%         938           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.97%         849           org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.86%         750           org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.82%         717           org.apache.lucene.util.SmallFloat#intToByte4()
0.78%         683           java.lang.Math#toIntExact()
0.62%         540           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.61%         529           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.51%         441           org.apache.lucene.search.ImpactsDISI#nextDoc()
0.50%         435           org.apache.lucene.search.ImpactsDISI#advanceTarget()

Baseline CPU JFR

PERCENT       CPU SAMPLES   STACK
16.38%        13363         org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.67%         7073          org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.05%         6569          org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.50%         5305          org.apache.lucene.search.DisiPriorityQueue#downHeap()
5.17%         4220          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.40%         3587          org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.30%         3505          org.apache.lucene.search.DisiPriorityQueue#topList()
3.37%         2751          java.lang.Math#round()
2.77%         2263          org.apache.lucene.search.DisiPriorityQueue#top()
2.64%         2156          org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.62%         2135          org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.12%         1729          org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.11%         1723          org.apache.lucene.util.SmallFloat#longToInt4()
2.06%         1678          org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.01%         1640          org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77%         1442          org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.62%         1325          org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.26%         1028          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.10%         899           java.lang.Math#toIntExact()
1.10%         898           org.apache.lucene.util.SmallFloat#intToByte4()
1.09%         888           org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
1.06%         864           org.apache.lucene.store.ByteBufferGuard#getByte()
0.80%         652           jdk.internal.misc.Unsafe#getByte()
0.74%         605           org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68%         552           org.apache.lucene.search.DisiPriorityQueue#prepend()
0.60%         492           org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.55%         451           java.io.RandomAccessFile#readBytes()
0.38%         314           org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.38%         307           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.32%         260           org.apache.lucene.codecs.lucene90.PForUtil#expand32()

As they look very similar, based on my previous debugging, it may suggest the max score being computed is high and most of the docs are not being skipped, hence the candidate implementation ended up examining the same amount of docs with baseline.

Dec 02 '21 08:12 zacharymorn

I just added two logging statements in ImpactsDISI to print out upTo and max score comparison to examine the docs pruning results:

diff --git a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
index 843bb4ad57b..3930ae2f50d 100644
--- a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
+++ b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
@@ -110,8 +110,10 @@ public final class ImpactsDISI extends DocIdSetIterator {
       assert upTo >= target;
 
       if (maxScore >= minCompetitiveScore) {
+        System.out.println("Keeping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
         return target;
       }
+      System.out.println("Skipping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
 
       if (upTo == NO_MORE_DOCS) {
         return NO_MORE_DOCS;

and the following query was used:

CombinedFieldQuery query =
            new CombinedFieldQuery.Builder()
                    .addField("titleTokenized", (float) 20.0)
                    .addField("body", (float) 1.0)
                    .addTerm(new BytesRef("three"))
                    .addTerm(new BytesRef("geronimo"))
                    .build();

For the implementation that uses mostly weighted field and lowest Impacts#getDocIdUpTo to drive scoring, upTo and maxScore were high, hence there was no pruning.

  1> Keeping - target: 152 upTo: 232464 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 232464 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 232464 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 232464 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 232464 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 232464 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 232464 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 232464 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 232464 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 232464 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 232464 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 232464 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 232464 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 232464 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 232464 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 232464 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 232464 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 232464 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 232464 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 232464 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 232464 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 232464 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 232464 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 232464 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 232464 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 232464 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 232464 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 232464 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 232464 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 232464 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 232464 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 232464 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 232464 minCompScore: 0.5026699 maxScore: 2.1560352
...

For the implementation that uses mostly weighted field and lowest cost / doc freq to drive scoring, upTo turned out to be the integer value of no more docs for that query, and there was no pruning either.

  1> Keeping - target: 151 upTo: 2147483647 minCompScore: 0.15862586 maxScore: 2.1560352
  1> Keeping - target: 152 upTo: 2147483647 minCompScore: 0.17629458 maxScore: 2.1560352
  1> Keeping - target: 159 upTo: 2147483647 minCompScore: 0.18669213 maxScore: 2.1560352
  1> Keeping - target: 160 upTo: 2147483647 minCompScore: 0.20146967 maxScore: 2.1560352
  1> Keeping - target: 161 upTo: 2147483647 minCompScore: 0.21165837 maxScore: 2.1560352
  1> Keeping - target: 167 upTo: 2147483647 minCompScore: 0.21696068 maxScore: 2.1560352
  1> Keeping - target: 170 upTo: 2147483647 minCompScore: 0.23223151 maxScore: 2.1560352
  1> Keeping - target: 171 upTo: 2147483647 minCompScore: 0.26476994 maxScore: 2.1560352
  1> Keeping - target: 172 upTo: 2147483647 minCompScore: 0.27027848 maxScore: 2.1560352
  1> Keeping - target: 174 upTo: 2147483647 minCompScore: 0.27216592 maxScore: 2.1560352
  1> Keeping - target: 175 upTo: 2147483647 minCompScore: 0.27311972 maxScore: 2.1560352
  1> Keeping - target: 176 upTo: 2147483647 minCompScore: 0.28871515 maxScore: 2.1560352
  1> Keeping - target: 179 upTo: 2147483647 minCompScore: 0.2955102 maxScore: 2.1560352
  1> Keeping - target: 185 upTo: 2147483647 minCompScore: 0.31796065 maxScore: 2.1560352
  1> Keeping - target: 187 upTo: 2147483647 minCompScore: 0.32868698 maxScore: 2.1560352
  1> Keeping - target: 189 upTo: 2147483647 minCompScore: 0.33091953 maxScore: 2.1560352
  1> Keeping - target: 190 upTo: 2147483647 minCompScore: 0.353465 maxScore: 2.1560352
  1> Keeping - target: 191 upTo: 2147483647 minCompScore: 0.36112097 maxScore: 2.1560352
  1> Keeping - target: 194 upTo: 2147483647 minCompScore: 0.3743663 maxScore: 2.1560352
  1> Keeping - target: 195 upTo: 2147483647 minCompScore: 0.38150516 maxScore: 2.1560352
  1> Keeping - target: 197 upTo: 2147483647 minCompScore: 0.39127985 maxScore: 2.1560352
  1> Keeping - target: 198 upTo: 2147483647 minCompScore: 0.39738885 maxScore: 2.1560352
  1> Keeping - target: 208 upTo: 2147483647 minCompScore: 0.39789525 maxScore: 2.1560352
  1> Keeping - target: 216 upTo: 2147483647 minCompScore: 0.41046718 maxScore: 2.1560352
  1> Keeping - target: 219 upTo: 2147483647 minCompScore: 0.4148362 maxScore: 2.1560352
  1> Keeping - target: 224 upTo: 2147483647 minCompScore: 0.4204301 maxScore: 2.1560352
  1> Keeping - target: 227 upTo: 2147483647 minCompScore: 0.43892494 maxScore: 2.1560352
  1> Keeping - target: 234 upTo: 2147483647 minCompScore: 0.4551008 maxScore: 2.1560352
  1> Keeping - target: 235 upTo: 2147483647 minCompScore: 0.47162262 maxScore: 2.1560352
  1> Keeping - target: 238 upTo: 2147483647 minCompScore: 0.47382453 maxScore: 2.1560352
  1> Keeping - target: 239 upTo: 2147483647 minCompScore: 0.48034182 maxScore: 2.1560352
  1> Keeping - target: 243 upTo: 2147483647 minCompScore: 0.48332027 maxScore: 2.1560352
  1> Keeping - target: 245 upTo: 2147483647 minCompScore: 0.4903498 maxScore: 2.1560352
  1> Keeping - target: 246 upTo: 2147483647 minCompScore: 0.5026699 maxScore: 2.1560352
  1> Keeping - target: 247 upTo: 2147483647 minCompScore: 0.50923806 maxScore: 2.1560352
  1> Keeping - target: 249 upTo: 2147483647 minCompScore: 0.50958425 maxScore: 2.1560352
  1> Keeping - target: 259 upTo: 2147483647 minCompScore: 0.51977855 maxScore: 2.1560352
  1> Keeping - target: 265 upTo: 2147483647 minCompScore: 0.5250303 maxScore: 2.1560352
  1> Keeping - target: 266 upTo: 2147483647 minCompScore: 0.53061455 maxScore: 2.1560352
  1> Keeping - target: 267 upTo: 2147483647 minCompScore: 0.53076476 maxScore: 2.1560352
  1> Keeping - target: 269 upTo: 2147483647 minCompScore: 0.53151745 maxScore: 2.1560352
  1> Keeping - target: 275 upTo: 2147483647 minCompScore: 0.56218606 maxScore: 2.1560352
  1> Keeping - target: 279 upTo: 2147483647 minCompScore: 0.57041436 maxScore: 2.1560352
  1> Keeping - target: 280 upTo: 2147483647 minCompScore: 0.58023125 maxScore: 2.1560352
  1> Keeping - target: 296 upTo: 2147483647 minCompScore: 0.592781 maxScore: 2.1560352
  1> Keeping - target: 298 upTo: 2147483647 minCompScore: 0.5975618 maxScore: 2.1560352
  1> Keeping - target: 314 upTo: 2147483647 minCompScore: 0.5996211 maxScore: 2.1560352
  1> Keeping - target: 318 upTo: 2147483647 minCompScore: 0.6027052 maxScore: 2.1560352
  1> Keeping - target: 319 upTo: 2147483647 minCompScore: 0.6080974 maxScore: 2.1560352
  1> Keeping - target: 329 upTo: 2147483647 minCompScore: 0.6116286 maxScore: 2.1560352
  1> Keeping - target: 335 upTo: 2147483647 minCompScore: 0.6127903 maxScore: 2.1560352
  1> Keeping - target: 336 upTo: 2147483647 minCompScore: 0.6186264 maxScore: 2.1560352
  1> Keeping - target: 339 upTo: 2147483647 minCompScore: 0.62189597 maxScore: 2.1560352
  1> Keeping - target: 342 upTo: 2147483647 minCompScore: 0.63768846 maxScore: 2.1560352
  1> Keeping - target: 346 upTo: 2147483647 minCompScore: 0.63877517 maxScore: 2.1560352
  1> Keeping - target: 352 upTo: 2147483647 minCompScore: 0.66235477 maxScore: 2.1560352
  1> Keeping - target: 354 upTo: 2147483647 minCompScore: 0.6642242 maxScore: 2.1560352
  1> Keeping - target: 355 upTo: 2147483647 minCompScore: 0.6710866 maxScore: 2.1560352
  1> Keeping - target: 358 upTo: 2147483647 minCompScore: 0.6718084 maxScore: 2.1560352
  1> Keeping - target: 362 upTo: 2147483647 minCompScore: 0.67271274 maxScore: 2.1560352
  1> Keeping - target: 365 upTo: 2147483647 minCompScore: 0.6831766 maxScore: 2.1560352
  1> Keeping - target: 367 upTo: 2147483647 minCompScore: 0.683592 maxScore: 2.1560352
  1> Keeping - target: 371 upTo: 2147483647 minCompScore: 0.6896402 maxScore: 2.1560352
...

For the implementation that uses lowest Impacts#getDocIdUpTo across all fields, we would get lower upTo and max score in general, and hence some docs do get pruned.

  1> Keeping - target: 152 upTo: 195 minCompScore: 0.17629458 maxScore: 1.3767262
  1> Keeping - target: 159 upTo: 195 minCompScore: 0.18669213 maxScore: 1.3767262
...
  1> Keeping - target: 219 upTo: 436 minCompScore: 0.4148362 maxScore: 1.7731527
  1> Keeping - target: 224 upTo: 436 minCompScore: 0.4204301 maxScore: 1.7731527
...
  1> Keeping - target: 764 upTo: 902 minCompScore: 0.9138265 maxScore: 1.65851
  1> Keeping - target: 767 upTo: 902 minCompScore: 0.91438395 maxScore: 1.65851
  1> Keeping - target: 779 upTo: 902 minCompScore: 0.91606015 maxScore: 1.65851
...
  1> Keeping - target: 1278 upTo: 1384 minCompScore: 1.0459963 maxScore: 1.5856887
  1> Keeping - target: 1280 upTo: 1384 minCompScore: 1.0520688 maxScore: 1.5856887
...
  1> Keeping - target: 7144 upTo: 7227 minCompScore: 1.428069 maxScore: 1.7997875
  1> Keeping - target: 7228 upTo: 7453 minCompScore: 1.4284092 maxScore: 1.4454873
  1> Skipping - target: 7454 upTo: 7702 minCompScore: 1.4284092 maxScore: 1.3662372
  1> Keeping - target: 7703 upTo: 7921 minCompScore: 1.4284092 maxScore: 1.6103871
  1> Keeping - target: 7922 upTo: 8131 minCompScore: 1.4284092 maxScore: 1.5153265
...
  1> Keeping - target: 17039 upTo: 17327 minCompScore: 1.5600302 maxScore: 1.6949782
  1> Keeping - target: 17328 upTo: 17873 minCompScore: 1.5600302 maxScore: 1.8440268
  1> Keeping - target: 17874 upTo: 18438 minCompScore: 1.5600302 maxScore: 1.8541759
  1> Skipping - target: 18439 upTo: 18774 minCompScore: 1.5600302 maxScore: 1.5445734
  1> Keeping - target: 18775 upTo: 19012 minCompScore: 1.5600302 maxScore: 2.1527436
  1> Keeping - target: 19001 upTo: 19012 minCompScore: 1.5605642 maxScore: 2.1527436
...
  1> Keeping - target: 19408 upTo: 19541 minCompScore: 1.5636915 maxScore: 1.927903
  1> Skipping - target: 19542 upTo: 19834 minCompScore: 1.5636915 maxScore: 1.5145609
  1> Keeping - target: 19835 upTo: 20077 minCompScore: 1.5636915 maxScore: 1.699799
  1> Keeping - target: 20066 upTo: 20077 minCompScore: 1.5694537 maxScore: 1.699799
...
  1> Keeping - target: 27159 upTo: 27396 minCompScore: 1.6439995 maxScore: 2.1519704
  1> Keeping - target: 27182 upTo: 27396 minCompScore: 1.6449014 maxScore: 2.1519704
  1> Keeping - target: 27397 upTo: 27627 minCompScore: 1.6449014 maxScore: 2.1505308
  1> Keeping - target: 27500 upTo: 27627 minCompScore: 1.6467081 maxScore: 2.1505308
  1> Skipping - target: 27628 upTo: 27859 minCompScore: 1.6467081 maxScore: 1.5844319
  1> Keeping - target: 27860 upTo: 28118 minCompScore: 1.6467081 maxScore: 1.7705853
  1> Keeping - target: 27976 upTo: 28118 minCompScore: 1.6487088 maxScore: 1.7705853
  1> Skipping - target: 28119 upTo: 28370 minCompScore: 1.6487088 maxScore: 1.576628
  1> Keeping - target: 28371 upTo: 28619 minCompScore: 1.6487088 maxScore: 2.0417855
  1> Keeping - target: 28620 upTo: 28954 minCompScore: 1.6487088 maxScore: 2.1356478

Dec 03 '21 05:12 zacharymorn

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Dec 03 '21 16:12 jpountz

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in https://github.com/apache/lucene/pull/418/commits/cbbd28bb2964228542e72b112f3b2f5f3164fae7.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in https://github.com/apache/lucene/pull/418/commits/502d44e7ff0b6de6061a4899589e4523e9bf74e8.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop https://github.com/apache/lucene/pull/418/commits/502d44e7ff0b6de6061a4899589e4523e9bf74e8#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

Dec 08 '21 06:12 zacharymorn

These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)

No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.

Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?

Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.

If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?

Hi @jpountz , just want to circle back to this discussion and see if you may have further suggestions to the above analysis?

Dec 19 '21 20:12 zacharymorn

Hi @jpountz , just want to check again to see if I have addressed your concern with the utility approach?

To better illustrate the potential issue I'm considering with the higher level abstraction approach, I've copied over some key code snippets here:

If we were to extract out SynonymImpactsSource (from SynonymQuery), it may look something like this:

public class SynonymImpactsSource implements ImpactsSource {

  private final ImpactsEnum[] impactsEnums;
  private final Impacts[] impacts;
  private final float[] boosts;
  private Impacts lead;

  public SynonymImpactsSource(ImpactsEnum[] impactsEnums, float[] boosts) {
    ...
  }

  @Override
  public Impacts getImpacts() throws IOException {
    // Use the impacts that have the lower next boundary as a lead.
    // It will decide on the number of levels and the block boundaries.
    if (lead == null) {
      Impacts tmpLead = null;
      for (int i = 0; i < impactsEnums.length; ++i) {
        impacts[i] = impactsEnums[i].getImpacts();
        if (tmpLead == null || impacts[i].getDocIdUpTo(0) < tmpLead.getDocIdUpTo(0)) {
          tmpLead = impacts[i];
        }
      }
      lead = tmpLead;
    }
    return new Impacts() {

      @Override
      public int numLevels() {
        // Delegate to the lead
        return lead.numLevels();
      }

      @Override
      public int getDocIdUpTo(int level) {
        // Delegate to the lead
        return lead.getDocIdUpTo(level);
      }

      @Override
      public List<Impact> getImpacts(int level) {
        final int docIdUpTo = getDocIdUpTo(level); 
        return ImpactsMergingUtils.mergeImpactsPerField(
            impactsEnums, impacts, boosts, docIdUpTo, false);
      }
    };
  }

  @Override
  public void advanceShallow(int target) throws IOException {
    for (ImpactsEnum impactsEnum : impactsEnums) {
      if (impactsEnum.docID() < target) {
        impactsEnum.advanceShallow(target);
      }
    }
  }

  public Impacts[] impacts() {
    return impacts;
  }
}

However, the above SynonymImpactsSource#getImpacts(level) may not be directly usable for CombinedFieldsQuery, as illustrated below

Impacts CombinedFieldsQuery$ImpactsSource#getImpacts {
   for (entry : Map<Field, SynonymImpactsSource>) {
        // this is potentially problematic, as this methods only takes in level info, and it internally looks up a field-specific docIdUpTo to get valid impacts (see definition above). However, for CombinedFieldQuery, docIdUpTo may be different among different fields with same level 
        combine SynonymImpactsSource.getImpacts(level) 
   }
  return combined Impacts
}

Hence we may actually need to change some APIs if we were to make SynonymImpactsSource to work in CombinedFieldQuery use case ?

Feb 01 '22 06:02 zacharymorn

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

Jan 09 '24 00:01 github-actions[bot]

LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery

Description

Tests