LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery
Description
This PR is WIP for implementing basic dynamic pruning support for CombinedFieldsQuery
#11099
Tests
Added a randomized test to compare and verify top 100 results match between top_score (with pruning) and complete scoring.
Perf tests result for commit 2ba435e5c83f870be9566 Run 1: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 3.53 (3.3%) 2.92 (5.3%) -17.2% ( -25% - -8%) 0.000 PKLookup 108.13 (7.7%) 119.85 (8.2%) 10.8% ( -4% - 28%) 0.000 CFQHighLow 14.88 (3.9%) 16.69 (12.5%) 12.2% ( -3% - 29%) 0.000 CFQHighMed 21.11 (4.1%) 25.87 (11.8%) 22.6% ( 6% - 40%) 0.000
Run 2: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighHigh 6.64 (3.1%) 5.63 (10.2%) -15.2% ( -27% - -1%) 0.000 CFQHighLow 8.35 (2.8%) 8.05 (15.0%) -3.6% ( -20% - 14%) 0.297 CFQHighMed 24.51 (5.3%) 24.90 (19.9%) 1.6% ( -22% - 28%) 0.733 PKLookup 110.06 (10.0%) 128.54 (7.9%) 16.8% ( -1% - 38%) 0.000
Run 3: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 13.01 (2.9%) 9.82 (7.8%) -24.5% ( -34% - -14%) 0.000 PKLookup 107.85 (8.1%) 111.79 (11.2%) 3.7% ( -14% - 24%) 0.236 CFQHighHigh 4.83 (2.6%) 5.06 (8.6%) 4.7% ( -6% - 16%) 0.018 CFQHighLow 14.95 (3.0%) 18.31 (19.0%) 22.5% ( 0% - 45%) 0.000
Run 4: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CFQHighMed 11.11 (2.9%) 6.69 (4.1%) -39.7% ( -45% - -33%) 0.000 CFQHighLow 27.55 (3.8%) 25.46 (11.0%) -7.6% ( -21% - 7%) 0.003 CFQHighHigh 5.25 (3.2%) 4.96 (6.1%) -5.7% ( -14% - 3%) 0.000 PKLookup 107.61 (6.7%) 121.19 (4.6%) 12.6% ( 1% - 25%) 0.000
Perf test result from 1a71469bf7afa3
Run 1
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHighHigh 4.64 (6.5%) 3.30 (4.7%) -29.0% ( -37% - -19%) 0.000
CFQHighHigh 11.09 (6.0%) 9.61 (6.0%) -13.3% ( -23% - -1%) 0.000
PKLookup 103.38 (4.4%) 108.04 (4.3%) 4.5% ( -4% - 13%) 0.001
CFQHighMedLow 10.58 (6.1%) 12.30 (8.7%) 16.2% ( 1% - 33%) 0.000
CFQHighMed 10.70 (7.4%) 15.51 (11.2%) 44.9% ( 24% - 68%) 0.000
CFQHighLowLow 8.18 (8.2%) 12.87 (11.6%) 57.3% ( 34% - 84%) 0.000
CFQHighLow 14.57 (7.5%) 30.81 (15.1%) 111.4% ( 82% - 144%) 0.000
Run 2
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHighHigh 5.33 (5.7%) 4.02 (7.7%) -24.4% ( -35% - -11%) 0.000
CFQHighLowLow 17.14 (6.2%) 13.06 (5.4%) -23.8% ( -33% - -13%) 0.000
CFQHighMed 17.37 (5.8%) 14.38 (7.7%) -17.2% ( -29% - -3%) 0.000
PKLookup 103.57 (5.5%) 108.84 (5.9%) 5.1% ( -6% - 17%) 0.005
CFQHighMedLow 11.25 (7.2%) 12.70 (9.0%) 12.9% ( -3% - 31%) 0.000
CFQHighHigh 5.00 (6.2%) 7.54 (12.1%) 51.0% ( 30% - 73%) 0.000
CFQHighLow 21.60 (5.2%) 34.57 (14.1%) 60.0% ( 38% - 83%) 0.000
Run 3
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHighHigh 5.40 (6.9%) 4.06 (5.1%) -24.8% ( -34% - -13%) 0.000
CFQHighMedLow 7.64 (7.4%) 5.79 (6.3%) -24.2% ( -35% - -11%) 0.000
CFQHighHigh 11.11 (7.0%) 9.60 (5.9%) -13.6% ( -24% - 0%) 0.000
CFQHighLowLow 21.21 (7.6%) 21.22 (6.6%) 0.0% ( -13% - 15%) 0.993
PKLookup 103.15 (5.9%) 107.60 (6.9%) 4.3% ( -8% - 18%) 0.034
CFQHighLow 21.85 (8.1%) 34.18 (13.5%) 56.4% ( 32% - 84%) 0.000
CFQHighMed 12.07 (8.4%) 19.98 (16.7%) 65.5% ( 37% - 98%) 0.000
Run 4
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 8.50 (5.8%) 6.85 (5.2%) -19.5% ( -28% - -8%) 0.000
CFQHighMedLow 10.89 (5.7%) 8.96 (5.4%) -17.8% ( -27% - -7%) 0.000
CFQHighMed 8.41 (5.8%) 7.74 (5.6%) -7.9% ( -18% - 3%) 0.000
CFQHighHighHigh 3.45 (6.7%) 3.38 (5.3%) -2.0% ( -13% - 10%) 0.287
CFQHighLowLow 7.82 (6.4%) 8.20 (7.5%) 4.8% ( -8% - 20%) 0.030
PKLookup 103.50 (5.0%) 110.69 (5.4%) 6.9% ( -3% - 18%) 0.000
CFQHighLow 11.46 (6.0%) 13.16 (6.7%) 14.8% ( 1% - 29%) 0.000
I re-ran perf test after https://github.com/mikemccand/luceneutil/commit/0550148b67f82d446e07bd0b4fdbde24f1d6228d has been merged:
Results from python3 src/python/localrun.py -source combinedFieldsBig:
Run 1:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 3.69 (1.8%) 2.49 (6.2%) -32.5% ( -39% - -24%) 0.000
CFQHighMed 4.95 (2.1%) 4.19 (5.9%) -15.5% ( -22% - -7%) 0.000
PKLookup 125.72 (4.5%) 126.86 (10.3%) 0.9% ( -13% - 16%) 0.719
CFQHighLow 19.92 (2.2%) 20.80 (9.5%) 4.4% ( -7% - 16%) 0.043
Run 2:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 3.61 (2.8%) 2.48 (2.9%) -31.4% ( -36% - -26%) 0.000
PKLookup 116.67 (7.1%) 123.97 (5.5%) 6.3% ( -5% - 20%) 0.002
CFQHighMed 4.97 (3.6%) 5.29 (5.5%) 6.6% ( -2% - 16%) 0.000
CFQHighLow 11.96 (4.5%) 13.99 (6.5%) 17.0% ( 5% - 29%) 0.000
Run 3:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 3.51 (4.2%) 2.44 (6.5%) -30.5% ( -39% - -20%) 0.000
PKLookup 105.72 (11.9%) 108.81 (11.2%) 2.9% ( -18% - 29%) 0.424
CFQHighMed 10.85 (4.2%) 11.60 (11.4%) 6.9% ( -8% - 23%) 0.011
CFQHighLow 15.11 (5.6%) 16.16 (9.8%) 7.0% ( -7% - 23%) 0.006
Results from python3 src/python/localrun.py -source combinedFieldsUnevenlyWeightedBig
Run 1:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
PKLookup 93.42 (13.7%) 88.23 (11.7%) -5.6% ( -27% - 23%) 0.168
CFQHighMed 4.69 (10.7%) 5.00 (18.0%) 6.6% ( -20% - 39%) 0.160
CFQHighHigh 4.51 (10.6%) 5.17 (17.7%) 14.6% ( -12% - 48%) 0.002
CFQHighLow 14.13 (8.5%) 23.11 (32.3%) 63.5% ( 20% - 114%) 0.000
Run 2:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighMed 4.77 (4.5%) 4.10 (8.3%) -14.2% ( -25% - -1%) 0.000
PKLookup 98.99 (12.3%) 101.47 (12.5%) 2.5% ( -19% - 31%) 0.522
CFQHighHigh 4.88 (5.3%) 5.98 (11.5%) 22.6% ( 5% - 41%) 0.000
CFQHighLow 11.57 (5.6%) 18.86 (18.8%) 62.9% ( 36% - 92%) 0.000
Run 3:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
CFQHighHigh 3.55 (5.1%) 2.38 (9.0%) -32.9% ( -44% - -19%) 0.000
PKLookup 101.29 (7.0%) 94.22 (15.4%) -7.0% ( -27% - 16%) 0.065
CFQHighLow 15.43 (5.8%) 16.60 (11.2%) 7.6% ( -8% - 26%) 0.007
CFQHighMed 3.12 (5.1%) 3.83 (15.0%) 22.7% ( 2% - 45%) 0.000
For one of the most negatively impacted query (-42.0%): CFQHighHigh: at united +combinedFields=titleTokenized^4.0,body^2.0 # freq=2834104 freq=1185528, the JFR CPU profiling result looks like the following
PERCENT CPU SAMPLES STACK
15.82% 13099 org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#doMergeImpactsPerField()
11.46% 9487 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
4.69% 3883 org.apache.lucene.search.DisiPriorityQueue#downHeap()
3.66% 3027 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
1.93% 1598 org.apache.lucene.search.DisjunctionDISIApproximation#advance()
1.92% 1590 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
1.92% 1588 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
1.77% 1467 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
1.68% 1392 org.apache.lucene.search.DisiPriorityQueue#top()
1.60% 1326 org.apache.lucene.search.DisiPriorityQueue#topList()
1.54% 1276 org.apache.lucene.util.PriorityQueue#downHeap()
1.50% 1243 java.lang.Math#round()
1.45% 1201 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.38% 1145 java.util.HashMap#resize()
1.35% 1115 org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.33% 1100 org.apache.lucene.sandbox.search.CombinedFieldQuery$1$1#mergeImpactsPerField()
1.21% 1001 java.util.HashMap$HashIterator#nextNode()
1.17% 972 java.util.LinkedList#linkLast()
1.13% 934 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.07% 883 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
1.06% 878 org.apache.lucene.store.ByteBufferGuard#getByte()
1.02% 841 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
suggesting quite some CPU time is spent on merging impacts within the same field. I'm suspecting this may occur when the max score is being computed too frequently, as frequent term's skip list would be "dense" and is also used to determine upTo for max score:
https://github.com/apache/lucene/blob/2a9adb81df314ffeb92951bbf2d99fecc94fa581/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L78-L82
Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to createMultiNormsLeafSimScorer in CombinedFieldQuery:
CombinedFieldQuery#scorer:
https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421
CombinedFieldQuery#explain:
https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389
For the CombinedFieldQuery#scorer one, fields may contain duplicated fields:
https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414
, whereas fieldAndWeights.values() should not contain duplicated fields.
I feel CombinedFieldQuery#scorer should be updated to use fieldAndWeights.values() as well? What do you think?
Hi @jpountz @jimczi, when I was doing some more deep dive to see how this PR can be improved further, I noticed a difference of field-weight parameter passed to create
MultiNormsLeafSimScorerinCombinedFieldQuery:
CombinedFieldQuery#scorer:https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421
CombinedFieldQuery#explain:https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389
For the
CombinedFieldQuery#scorerone,fieldsmay contain duplicated fields:https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414
, whereas
fieldAndWeights.values()should not contain duplicated fields.I feel
CombinedFieldQuery#scorershould be updated to usefieldAndWeights.values()as well? What do you think?
For now I have created a spin-off issue https://issues.apache.org/jira/browse/LUCENE-10236 and a separate PR https://github.com/apache/lucene/pull/444 for this. Please let me know if they look good to you.
Thanks for making progress on this and sorry for the slow review cycles.
No worry and thanks for the review and feedback!
I just had a quick look and it seems that some computations can be move up-front such as deciding what field we use as a lead since field weights do not change during query execution?
Yup I can move it out to avoid repeated computation there. Just a quick check though, from the perf results so far (https://github.com/apache/lucene/pull/418#discussion_r754091056 & https://github.com/apache/lucene/pull/418#discussion_r759006970), it seems using the most weighted field to decide on the leading impacts may not yield the best result, as matches in most weighted field may also give a high docId upTo bound. If we were to drop that approach and revert the commit https://github.com/apache/lucene/pull/418/commits/db2446f0337e5860bb8c731df1f6c16281efb6bb that introduced it, then it won't be computing most weighted field at all. Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?
I wonder if there's things we can do to make the code simpler, such as sharing the logic to combine multiple terms together within the same field with SynonymQuery.
Yeah I have the same thought there. I wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!
from the perf results so far, it seems using the most weighted field to decide on the leading impacts may not yield the best result Are you trying to see if perf tests can be further improved if most weighted field only gets computed once?
Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?
wasn't sure earlier as SynonymQuery and CombinedFieldsQuery sit in different modules / packages, so extracting common code out may potentially require a new public specialized API / class (in the core module ?). Let me give that a try and see how it looks!
Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.
Maybe we could have a SynonymImpactsSource or something like that as a public @lucene.internal class in core that would provide this logic.
Ah yes using @lucene.internal makes sense! I have made a new class to extract out the impacts merging within same field logic in https://github.com/apache/lucene/pull/418/commits/0a9bdccf01d568b0239cf894ff626e47d8333df3, and use that in CombinedFieldsQuery https://github.com/apache/lucene/pull/418/commits/808fec28ab6d7179974665e7fa069270845811fa. I've also moved up the repeated max weight field computation in https://github.com/apache/lucene/pull/418/commits/8a7ea9998b189649ccdc75ecdd4b096460abb0af .
Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps?
I see. The latest two commits https://github.com/apache/lucene/pull/418/commits/808fec28ab6d7179974665e7fa069270845811fa & https://github.com/apache/lucene/pull/418/commits/75c5b046f8af4e73341a26a1828e97e006115a6b used max weight field to drive scoring, and overall they had around -20% impacts to tasks from combinedFieldsUnevenlyWeightedBig so far. A sample CPU JFR results between candidate and baseline look like the following:
Candidate CPU JFR:
PERCENT CPU SAMPLES STACK
15.13% 13191 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
7.42% 6466 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
7.29% 6354 org.apache.lucene.search.DisjunctionDISIApproximation#advance()
6.79% 5921 org.apache.lucene.search.DisiPriorityQueue#downHeap()
4.34% 3785 org.apache.lucene.search.DisiPriorityQueue#topList()
3.97% 3465 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
3.85% 3360 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
3.18% 2777 java.lang.Math#round()
3.01% 2624 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
3.01% 2624 org.apache.lucene.search.DisiPriorityQueue#top()
2.33% 2036 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.20% 1917 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.07% 1809 org.apache.lucene.util.SmallFloat#longToInt4()
1.81% 1580 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.78% 1555 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance()
1.77% 1545 org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.51% 1316 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
1.39% 1210 org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.29% 1124 org.apache.lucene.search.ImpactsDISI#docID()
1.22% 1063 jdk.internal.misc.Unsafe#getByte()
1.19% 1037 org.apache.lucene.store.ByteBufferGuard#getByte()
1.08% 938 org.apache.lucene.search.DisiPriorityQueue#prepend()
0.97% 849 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.86% 750 org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo()
0.82% 717 org.apache.lucene.util.SmallFloat#intToByte4()
0.78% 683 java.lang.Math#toIntExact()
0.62% 540 org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.61% 529 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq()
0.51% 441 org.apache.lucene.search.ImpactsDISI#nextDoc()
0.50% 435 org.apache.lucene.search.ImpactsDISI#advanceTarget()
Baseline CPU JFR
PERCENT CPU SAMPLES STACK
16.38% 13363 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact()
8.67% 7073 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
8.05% 6569 org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc()
6.50% 5305 org.apache.lucene.search.DisiPriorityQueue#downHeap()
5.17% 4220 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq()
4.40% 3587 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
4.30% 3505 org.apache.lucene.search.DisiPriorityQueue#topList()
3.37% 2751 java.lang.Math#round()
2.77% 2263 org.apache.lucene.search.DisiPriorityQueue#top()
2.64% 2156 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq()
2.62% 2135 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score()
2.12% 1729 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue()
2.11% 1723 org.apache.lucene.util.SmallFloat#longToInt4()
2.06% 1678 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.01% 1640 org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.77% 1442 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
1.62% 1325 org.apache.lucene.search.DisiPriorityQueue#updateTop()
1.26% 1028 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.10% 899 java.lang.Math#toIntExact()
1.10% 898 org.apache.lucene.util.SmallFloat#intToByte4()
1.09% 888 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq()
1.06% 864 org.apache.lucene.store.ByteBufferGuard#getByte()
0.80% 652 jdk.internal.misc.Unsafe#getByte()
0.74% 605 org.apache.lucene.codecs.lucene90.PForUtil#decode()
0.68% 552 org.apache.lucene.search.DisiPriorityQueue#prepend()
0.60% 492 org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum()
0.55% 451 java.io.RandomAccessFile#readBytes()
0.38% 314 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.38% 307 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
0.32% 260 org.apache.lucene.codecs.lucene90.PForUtil#expand32()
As they look very similar, based on my previous debugging, it may suggest the max score being computed is high and most of the docs are not being skipped, hence the candidate implementation ended up examining the same amount of docs with baseline.
I just added two logging statements in ImpactsDISI to print out upTo and max score comparison to examine the docs pruning results:
diff --git a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
index 843bb4ad57b..3930ae2f50d 100644
--- a/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
+++ b/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java
@@ -110,8 +110,10 @@ public final class ImpactsDISI extends DocIdSetIterator {
assert upTo >= target;
if (maxScore >= minCompetitiveScore) {
+ System.out.println("Keeping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
return target;
}
+ System.out.println("Skipping - target: " + target + " upTo: " + upTo + " minCompScore: " + minCompetitiveScore + " maxScore: " + maxScore);
if (upTo == NO_MORE_DOCS) {
return NO_MORE_DOCS;
and the following query was used:
CombinedFieldQuery query =
new CombinedFieldQuery.Builder()
.addField("titleTokenized", (float) 20.0)
.addField("body", (float) 1.0)
.addTerm(new BytesRef("three"))
.addTerm(new BytesRef("geronimo"))
.build();
For the implementation that uses mostly weighted field and lowest Impacts#getDocIdUpTo to drive scoring, upTo and maxScore were high, hence there was no pruning.
1> Keeping - target: 152 upTo: 232464 minCompScore: 0.17629458 maxScore: 2.1560352
1> Keeping - target: 159 upTo: 232464 minCompScore: 0.18669213 maxScore: 2.1560352
1> Keeping - target: 160 upTo: 232464 minCompScore: 0.20146967 maxScore: 2.1560352
1> Keeping - target: 161 upTo: 232464 minCompScore: 0.21165837 maxScore: 2.1560352
1> Keeping - target: 167 upTo: 232464 minCompScore: 0.21696068 maxScore: 2.1560352
1> Keeping - target: 170 upTo: 232464 minCompScore: 0.23223151 maxScore: 2.1560352
1> Keeping - target: 171 upTo: 232464 minCompScore: 0.26476994 maxScore: 2.1560352
1> Keeping - target: 172 upTo: 232464 minCompScore: 0.27027848 maxScore: 2.1560352
1> Keeping - target: 174 upTo: 232464 minCompScore: 0.27216592 maxScore: 2.1560352
1> Keeping - target: 175 upTo: 232464 minCompScore: 0.27311972 maxScore: 2.1560352
1> Keeping - target: 176 upTo: 232464 minCompScore: 0.28871515 maxScore: 2.1560352
1> Keeping - target: 179 upTo: 232464 minCompScore: 0.2955102 maxScore: 2.1560352
1> Keeping - target: 185 upTo: 232464 minCompScore: 0.31796065 maxScore: 2.1560352
1> Keeping - target: 187 upTo: 232464 minCompScore: 0.32868698 maxScore: 2.1560352
1> Keeping - target: 189 upTo: 232464 minCompScore: 0.33091953 maxScore: 2.1560352
1> Keeping - target: 190 upTo: 232464 minCompScore: 0.353465 maxScore: 2.1560352
1> Keeping - target: 191 upTo: 232464 minCompScore: 0.36112097 maxScore: 2.1560352
1> Keeping - target: 194 upTo: 232464 minCompScore: 0.3743663 maxScore: 2.1560352
1> Keeping - target: 195 upTo: 232464 minCompScore: 0.38150516 maxScore: 2.1560352
1> Keeping - target: 197 upTo: 232464 minCompScore: 0.39127985 maxScore: 2.1560352
1> Keeping - target: 198 upTo: 232464 minCompScore: 0.39738885 maxScore: 2.1560352
1> Keeping - target: 208 upTo: 232464 minCompScore: 0.39789525 maxScore: 2.1560352
1> Keeping - target: 216 upTo: 232464 minCompScore: 0.41046718 maxScore: 2.1560352
1> Keeping - target: 219 upTo: 232464 minCompScore: 0.4148362 maxScore: 2.1560352
1> Keeping - target: 224 upTo: 232464 minCompScore: 0.4204301 maxScore: 2.1560352
1> Keeping - target: 227 upTo: 232464 minCompScore: 0.43892494 maxScore: 2.1560352
1> Keeping - target: 234 upTo: 232464 minCompScore: 0.4551008 maxScore: 2.1560352
1> Keeping - target: 235 upTo: 232464 minCompScore: 0.47162262 maxScore: 2.1560352
1> Keeping - target: 238 upTo: 232464 minCompScore: 0.47382453 maxScore: 2.1560352
1> Keeping - target: 239 upTo: 232464 minCompScore: 0.48034182 maxScore: 2.1560352
1> Keeping - target: 243 upTo: 232464 minCompScore: 0.48332027 maxScore: 2.1560352
1> Keeping - target: 245 upTo: 232464 minCompScore: 0.4903498 maxScore: 2.1560352
1> Keeping - target: 246 upTo: 232464 minCompScore: 0.5026699 maxScore: 2.1560352
...
For the implementation that uses mostly weighted field and lowest cost / doc freq to drive scoring, upTo turned out to be the integer value of no more docs for that query, and there was no pruning either.
1> Keeping - target: 151 upTo: 2147483647 minCompScore: 0.15862586 maxScore: 2.1560352
1> Keeping - target: 152 upTo: 2147483647 minCompScore: 0.17629458 maxScore: 2.1560352
1> Keeping - target: 159 upTo: 2147483647 minCompScore: 0.18669213 maxScore: 2.1560352
1> Keeping - target: 160 upTo: 2147483647 minCompScore: 0.20146967 maxScore: 2.1560352
1> Keeping - target: 161 upTo: 2147483647 minCompScore: 0.21165837 maxScore: 2.1560352
1> Keeping - target: 167 upTo: 2147483647 minCompScore: 0.21696068 maxScore: 2.1560352
1> Keeping - target: 170 upTo: 2147483647 minCompScore: 0.23223151 maxScore: 2.1560352
1> Keeping - target: 171 upTo: 2147483647 minCompScore: 0.26476994 maxScore: 2.1560352
1> Keeping - target: 172 upTo: 2147483647 minCompScore: 0.27027848 maxScore: 2.1560352
1> Keeping - target: 174 upTo: 2147483647 minCompScore: 0.27216592 maxScore: 2.1560352
1> Keeping - target: 175 upTo: 2147483647 minCompScore: 0.27311972 maxScore: 2.1560352
1> Keeping - target: 176 upTo: 2147483647 minCompScore: 0.28871515 maxScore: 2.1560352
1> Keeping - target: 179 upTo: 2147483647 minCompScore: 0.2955102 maxScore: 2.1560352
1> Keeping - target: 185 upTo: 2147483647 minCompScore: 0.31796065 maxScore: 2.1560352
1> Keeping - target: 187 upTo: 2147483647 minCompScore: 0.32868698 maxScore: 2.1560352
1> Keeping - target: 189 upTo: 2147483647 minCompScore: 0.33091953 maxScore: 2.1560352
1> Keeping - target: 190 upTo: 2147483647 minCompScore: 0.353465 maxScore: 2.1560352
1> Keeping - target: 191 upTo: 2147483647 minCompScore: 0.36112097 maxScore: 2.1560352
1> Keeping - target: 194 upTo: 2147483647 minCompScore: 0.3743663 maxScore: 2.1560352
1> Keeping - target: 195 upTo: 2147483647 minCompScore: 0.38150516 maxScore: 2.1560352
1> Keeping - target: 197 upTo: 2147483647 minCompScore: 0.39127985 maxScore: 2.1560352
1> Keeping - target: 198 upTo: 2147483647 minCompScore: 0.39738885 maxScore: 2.1560352
1> Keeping - target: 208 upTo: 2147483647 minCompScore: 0.39789525 maxScore: 2.1560352
1> Keeping - target: 216 upTo: 2147483647 minCompScore: 0.41046718 maxScore: 2.1560352
1> Keeping - target: 219 upTo: 2147483647 minCompScore: 0.4148362 maxScore: 2.1560352
1> Keeping - target: 224 upTo: 2147483647 minCompScore: 0.4204301 maxScore: 2.1560352
1> Keeping - target: 227 upTo: 2147483647 minCompScore: 0.43892494 maxScore: 2.1560352
1> Keeping - target: 234 upTo: 2147483647 minCompScore: 0.4551008 maxScore: 2.1560352
1> Keeping - target: 235 upTo: 2147483647 minCompScore: 0.47162262 maxScore: 2.1560352
1> Keeping - target: 238 upTo: 2147483647 minCompScore: 0.47382453 maxScore: 2.1560352
1> Keeping - target: 239 upTo: 2147483647 minCompScore: 0.48034182 maxScore: 2.1560352
1> Keeping - target: 243 upTo: 2147483647 minCompScore: 0.48332027 maxScore: 2.1560352
1> Keeping - target: 245 upTo: 2147483647 minCompScore: 0.4903498 maxScore: 2.1560352
1> Keeping - target: 246 upTo: 2147483647 minCompScore: 0.5026699 maxScore: 2.1560352
1> Keeping - target: 247 upTo: 2147483647 minCompScore: 0.50923806 maxScore: 2.1560352
1> Keeping - target: 249 upTo: 2147483647 minCompScore: 0.50958425 maxScore: 2.1560352
1> Keeping - target: 259 upTo: 2147483647 minCompScore: 0.51977855 maxScore: 2.1560352
1> Keeping - target: 265 upTo: 2147483647 minCompScore: 0.5250303 maxScore: 2.1560352
1> Keeping - target: 266 upTo: 2147483647 minCompScore: 0.53061455 maxScore: 2.1560352
1> Keeping - target: 267 upTo: 2147483647 minCompScore: 0.53076476 maxScore: 2.1560352
1> Keeping - target: 269 upTo: 2147483647 minCompScore: 0.53151745 maxScore: 2.1560352
1> Keeping - target: 275 upTo: 2147483647 minCompScore: 0.56218606 maxScore: 2.1560352
1> Keeping - target: 279 upTo: 2147483647 minCompScore: 0.57041436 maxScore: 2.1560352
1> Keeping - target: 280 upTo: 2147483647 minCompScore: 0.58023125 maxScore: 2.1560352
1> Keeping - target: 296 upTo: 2147483647 minCompScore: 0.592781 maxScore: 2.1560352
1> Keeping - target: 298 upTo: 2147483647 minCompScore: 0.5975618 maxScore: 2.1560352
1> Keeping - target: 314 upTo: 2147483647 minCompScore: 0.5996211 maxScore: 2.1560352
1> Keeping - target: 318 upTo: 2147483647 minCompScore: 0.6027052 maxScore: 2.1560352
1> Keeping - target: 319 upTo: 2147483647 minCompScore: 0.6080974 maxScore: 2.1560352
1> Keeping - target: 329 upTo: 2147483647 minCompScore: 0.6116286 maxScore: 2.1560352
1> Keeping - target: 335 upTo: 2147483647 minCompScore: 0.6127903 maxScore: 2.1560352
1> Keeping - target: 336 upTo: 2147483647 minCompScore: 0.6186264 maxScore: 2.1560352
1> Keeping - target: 339 upTo: 2147483647 minCompScore: 0.62189597 maxScore: 2.1560352
1> Keeping - target: 342 upTo: 2147483647 minCompScore: 0.63768846 maxScore: 2.1560352
1> Keeping - target: 346 upTo: 2147483647 minCompScore: 0.63877517 maxScore: 2.1560352
1> Keeping - target: 352 upTo: 2147483647 minCompScore: 0.66235477 maxScore: 2.1560352
1> Keeping - target: 354 upTo: 2147483647 minCompScore: 0.6642242 maxScore: 2.1560352
1> Keeping - target: 355 upTo: 2147483647 minCompScore: 0.6710866 maxScore: 2.1560352
1> Keeping - target: 358 upTo: 2147483647 minCompScore: 0.6718084 maxScore: 2.1560352
1> Keeping - target: 362 upTo: 2147483647 minCompScore: 0.67271274 maxScore: 2.1560352
1> Keeping - target: 365 upTo: 2147483647 minCompScore: 0.6831766 maxScore: 2.1560352
1> Keeping - target: 367 upTo: 2147483647 minCompScore: 0.683592 maxScore: 2.1560352
1> Keeping - target: 371 upTo: 2147483647 minCompScore: 0.6896402 maxScore: 2.1560352
...
For the implementation that uses lowest Impacts#getDocIdUpTo across all fields, we would get lower upTo and max score in general, and hence some docs do get pruned.
1> Keeping - target: 152 upTo: 195 minCompScore: 0.17629458 maxScore: 1.3767262
1> Keeping - target: 159 upTo: 195 minCompScore: 0.18669213 maxScore: 1.3767262
...
1> Keeping - target: 219 upTo: 436 minCompScore: 0.4148362 maxScore: 1.7731527
1> Keeping - target: 224 upTo: 436 minCompScore: 0.4204301 maxScore: 1.7731527
...
1> Keeping - target: 764 upTo: 902 minCompScore: 0.9138265 maxScore: 1.65851
1> Keeping - target: 767 upTo: 902 minCompScore: 0.91438395 maxScore: 1.65851
1> Keeping - target: 779 upTo: 902 minCompScore: 0.91606015 maxScore: 1.65851
...
1> Keeping - target: 1278 upTo: 1384 minCompScore: 1.0459963 maxScore: 1.5856887
1> Keeping - target: 1280 upTo: 1384 minCompScore: 1.0520688 maxScore: 1.5856887
...
1> Keeping - target: 7144 upTo: 7227 minCompScore: 1.428069 maxScore: 1.7997875
1> Keeping - target: 7228 upTo: 7453 minCompScore: 1.4284092 maxScore: 1.4454873
1> Skipping - target: 7454 upTo: 7702 minCompScore: 1.4284092 maxScore: 1.3662372
1> Keeping - target: 7703 upTo: 7921 minCompScore: 1.4284092 maxScore: 1.6103871
1> Keeping - target: 7922 upTo: 8131 minCompScore: 1.4284092 maxScore: 1.5153265
...
1> Keeping - target: 17039 upTo: 17327 minCompScore: 1.5600302 maxScore: 1.6949782
1> Keeping - target: 17328 upTo: 17873 minCompScore: 1.5600302 maxScore: 1.8440268
1> Keeping - target: 17874 upTo: 18438 minCompScore: 1.5600302 maxScore: 1.8541759
1> Skipping - target: 18439 upTo: 18774 minCompScore: 1.5600302 maxScore: 1.5445734
1> Keeping - target: 18775 upTo: 19012 minCompScore: 1.5600302 maxScore: 2.1527436
1> Keeping - target: 19001 upTo: 19012 minCompScore: 1.5605642 maxScore: 2.1527436
...
1> Keeping - target: 19408 upTo: 19541 minCompScore: 1.5636915 maxScore: 1.927903
1> Skipping - target: 19542 upTo: 19834 minCompScore: 1.5636915 maxScore: 1.5145609
1> Keeping - target: 19835 upTo: 20077 minCompScore: 1.5636915 maxScore: 1.699799
1> Keeping - target: 20066 upTo: 20077 minCompScore: 1.5694537 maxScore: 1.699799
...
1> Keeping - target: 27159 upTo: 27396 minCompScore: 1.6439995 maxScore: 2.1519704
1> Keeping - target: 27182 upTo: 27396 minCompScore: 1.6449014 maxScore: 2.1519704
1> Keeping - target: 27397 upTo: 27627 minCompScore: 1.6449014 maxScore: 2.1505308
1> Keeping - target: 27500 upTo: 27627 minCompScore: 1.6467081 maxScore: 2.1505308
1> Skipping - target: 27628 upTo: 27859 minCompScore: 1.6467081 maxScore: 1.5844319
1> Keeping - target: 27860 upTo: 28118 minCompScore: 1.6467081 maxScore: 1.7705853
1> Keeping - target: 27976 upTo: 28118 minCompScore: 1.6487088 maxScore: 1.7705853
1> Skipping - target: 28119 upTo: 28370 minCompScore: 1.6487088 maxScore: 1.576628
1> Keeping - target: 28371 upTo: 28619 minCompScore: 1.6487088 maxScore: 2.0417855
1> Keeping - target: 28620 upTo: 28954 minCompScore: 1.6487088 maxScore: 2.1356478
These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)
Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a SynonymImpactsSource, rather than low-level utility methods?
These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)
No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in https://github.com/apache/lucene/pull/418/commits/cbbd28bb2964228542e72b112f3b2f5f3164fae7.
Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a
SynonymImpactsSource, rather than low-level utility methods?
Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in https://github.com/apache/lucene/pull/418/commits/502d44e7ff0b6de6061a4899589e4523e9bf74e8.
If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over Map<Field, SynonymImpactsSource> or even Map<Field, SynonymImpacts> in the numLevels, getDocIdUpTo and (in particular) getImpacts implementations in CombinedFieldsQuery? However, I noticed that for getImpacts, it doesn't seems to be easily decomposable / delegate-able to getImpacts of SynonymImpactsSource or SynonymImpacts. As that method takes in a level and internally looks up a docIdUpTo of the field to get a list of impacts with that boundary, for CombinedFieldsQuery we actually need a docIdUpTo that would be valid across all fields, hence it couldn't be abstracted away in the loop https://github.com/apache/lucene/pull/418/commits/502d44e7ff0b6de6061a4899589e4523e9bf74e8#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?
These numbers are super interesting. Let's go with the approach of you baseline for now since it performs better, sorry for putting you on a track that performs worse. :)
No problem! It's actually quite enjoyable for me to explore different approaches and compare their performance characteristics. I may also miss certain aspects as well, so this discussion helps to verify my understanding also! I've re-implemented the other approach in cbbd28b.
Regarding the new utility class, I was hoping that it could be a higher-level abstraction, eg. a
SynonymImpactsSource, rather than low-level utility methods?Yeah I did give that a try earlier, but I ended up with the utility methods approach as I saw some potential issue. I implemented it again in 502d44e.
If my understanding was correct, the intention of using higher-level abstraction there was to reduce code duplication by using a loop over
Map<Field, SynonymImpactsSource>or evenMap<Field, SynonymImpacts>in thenumLevels,getDocIdUpToand (in particular)getImpactsimplementations inCombinedFieldsQuery? However, I noticed that forgetImpacts, it doesn't seems to be easily decomposable / delegate-able togetImpactsofSynonymImpactsSourceorSynonymImpacts. As that method takes in aleveland internally looks up adocIdUpToof the field to get a list of impacts with that boundary, forCombinedFieldsQuerywe actually need adocIdUpTothat would be valid across all fields, hence it couldn't be abstracted away in the loop 502d44e#diff-62e151fca32e7560ef90ee8eab1adaffac3073f23701aaf0c17a02e250727ed3R547-R550 . Given this restriction, I feel using utility methods may work better in removing duplication in code?
Hi @jpountz , just want to circle back to this discussion and see if you may have further suggestions to the above analysis?
Hi @jpountz , just want to check again to see if I have addressed your concern with the utility approach?
To better illustrate the potential issue I'm considering with the higher level abstraction approach, I've copied over some key code snippets here:
If we were to extract out SynonymImpactsSource (from SynonymQuery), it may look something like this:
public class SynonymImpactsSource implements ImpactsSource {
private final ImpactsEnum[] impactsEnums;
private final Impacts[] impacts;
private final float[] boosts;
private Impacts lead;
public SynonymImpactsSource(ImpactsEnum[] impactsEnums, float[] boosts) {
...
}
@Override
public Impacts getImpacts() throws IOException {
// Use the impacts that have the lower next boundary as a lead.
// It will decide on the number of levels and the block boundaries.
if (lead == null) {
Impacts tmpLead = null;
for (int i = 0; i < impactsEnums.length; ++i) {
impacts[i] = impactsEnums[i].getImpacts();
if (tmpLead == null || impacts[i].getDocIdUpTo(0) < tmpLead.getDocIdUpTo(0)) {
tmpLead = impacts[i];
}
}
lead = tmpLead;
}
return new Impacts() {
@Override
public int numLevels() {
// Delegate to the lead
return lead.numLevels();
}
@Override
public int getDocIdUpTo(int level) {
// Delegate to the lead
return lead.getDocIdUpTo(level);
}
@Override
public List<Impact> getImpacts(int level) {
final int docIdUpTo = getDocIdUpTo(level);
return ImpactsMergingUtils.mergeImpactsPerField(
impactsEnums, impacts, boosts, docIdUpTo, false);
}
};
}
@Override
public void advanceShallow(int target) throws IOException {
for (ImpactsEnum impactsEnum : impactsEnums) {
if (impactsEnum.docID() < target) {
impactsEnum.advanceShallow(target);
}
}
}
public Impacts[] impacts() {
return impacts;
}
}
However, the above SynonymImpactsSource#getImpacts(level) may not be directly usable for CombinedFieldsQuery, as illustrated below
Impacts CombinedFieldsQuery$ImpactsSource#getImpacts {
for (entry : Map<Field, SynonymImpactsSource>) {
// this is potentially problematic, as this methods only takes in level info, and it internally looks up a field-specific docIdUpTo to get valid impacts (see definition above). However, for CombinedFieldQuery, docIdUpTo may be different among different fields with same level
combine SynonymImpactsSource.getImpacts(level)
}
return combined Impacts
}
Hence we may actually need to change some APIs if we were to make SynonymImpactsSource to work in CombinedFieldQuery use case ?
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!