SOLR-17131: Optimize rows=0 since score/sort isn't necessary
https://issues.apache.org/jira/browse/SOLR-17131
Description
If a query has rows=0 Solr does not need to sort the results or even compute the score. This means the filterCache can be more efficient to compute the result and Solr doesn't spend time on unnecessarily sorting and scoring large result sets for it to be thrown away.
There is one subtle difference from this change - maxScore is no longer returned if rows=0 which seems like a reasonable tradeoff. If maxScore is needed then set rows=1. maxScore can't reliably be compared across searches anyway so no reason to have it when rows=0.
This was found while looking at https://issues.apache.org/jira/browse/SOLR-14765 and trying to reason through the case of rows=0 that are sometimes used to just determine count of matching results. This change adds the check for rows=0 before other checks to ensure that we can short circuit scoring even if there is a scoring query. We also don't add the query to the filter cache since that could result in filter cache thrashing.
In some tests, we found that queries with rows=0 p99 improved by 50% and CPU dropped by 5%.
Tests
Updated tests in TestMainQueryCaching to account for this optimization.
Checklist
Please review the following and check all that apply:
- [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
- [x] I have created a Jira issue and added the issue ID to my pull request title.
- [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
- [x] I have developed this patch against the
mainbranch. - [x] I have run
./gradlew check. - [x] I have added tests for my changes.
- [ ] I have added documentation for the Reference Guide
does not need to sort the results or even compute the score
Is there a unit test you can recommend that I might run in a debugger looking at key spots, that would show that a score is being computed wastefully? I can't imagine that there is time wasted sorting (as you claim) because there is nothing to sort.
This means the filterCache can be more efficient to compute the result and Solr doesn't spend time on unnecessarily sorting and scoring large result sets for it to be thrown away.
This seems contradictory. If the filterCache can compute the result, then wouldn't disabling the filterCache for the query (as I see you do in the PR) not use the filter cache then? I'm referring to the query, not the filters (which are processed the same either way).
Is there a unit test you can recommend that I might run in a debugger looking at key spots, that would show that a score is being computed wastefully?
TestMainQueryCaching#testConstantScoreFlScore and put a breakpoint at SolrIndexSearcher line 1620
If there is rows=0 and one of three cases, the scores are computed and sorted even though its not needed:
flincludesscoresortincludesscoreqis a non constant query
If you walk through the existing if checks in SolrIndexSearcher line 1620, the first case isn't chosen since scores are requested. The second case doesn't match since its not a *:* or MatchAllDocsQuery - even if its a constant score query useFilterForSortedQuery is false and we don't really want to enable (and its false by default) that case (puts the q in the filter cache - related to your other question I'll answer below). The final case is useFilterCacheForDynamicScoreQuery which if useFilterForSortedQuery is false (which it is by default) we skip optimizations.
The changes to TestMainQueryCaching also show the other cases where the rows=0 check improves things. Avoids a full sort.
This seems contradictory. If the filterCache can compute the result, then wouldn't disabling the filterCache for the query (as I see you do in the PR) not use the filter cache then? I'm referring to the query, not the filters (which are processed the same either way).
So the check I added makes sure we don't add the q query to the filterCache. We want to use the filterCache to compute the result, but don't want to store the q in the filterCache after its computed. This is similar to useFilterForSortedQuery but this is specifically the case of a rows=0 query.
I see what's happening. The queryResultCache's queryResultWindowSize is forcing the actual number of documents to be raised from 0 to that window (which is 1 in this test config), thus we have to score docs to find the one best doc.
I don't agree with your solution because it confuses what useFilterCache fundamentally means (use the filter cache for q). The other code path is actually already optimized for rows==0 and no scores; it should be quite fast!
I think the solution is to not save the results into queryResultCache when rows==0; thus the window won't be applied either. But I don't think SolrIndexSearcher should set this, I think it should be in QueryComponent by setting NO_SET_QCACHE so that some plugin/customization can easily make this decision for itself. Additionally, QueryComponent is setting GET_SCORES if fl contains the score but I think this should be further guarded by rows != 0.
I also think a queryResultWindowSize of 1 (which is the default if it isn't specified in solrconfig and isn't documented AFAICT) should not bump up a request for no rows to a window of a paltry 1 doc. I don't think that was intended; I think what was intended was to effectively have no window if it isn't specified. I can see using a default of 1 seems to convey that (for typical queries that have results) but not for rows=0 case. Likely an oversight long ago.
The other code path is actually already optimized for rows==0 and no scores
Which code path are you talking about?
I think the solution is to not save the results into queryResultCache when rows==0; thus the window won't be applied either.
Not sure I agree with that. We should be able to store the result in the queryResultCache so that we don't have to recompute for the same exact query a second time?
I also think a queryResultWindowSize of 1 (which is the default if it isn't specified in solrconfig and isn't documented AFAICT)
Its documented here https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#queryresultwindowsize-element
I know in our case we do set queryResultWindowSize in solrconfig.xml.
Either way I'll look into QueryComponent some. @dsmiley as far as you are aware this change even though maybe not the best place to put it - doesn't have any negative side effects right?
(Me) The other code path is actually already optimized for rows==0 and no scores
The method you are attempting to modify calls getDocListAndSetNC when useFilterCache=false, which is well optimized for this.
Not sure I agree with that. We should be able to store the result in the queryResultCache so that we don't have to recompute for the same exact query a second time?
Okay; I sympathize. Another alteration to my suggestion, is that queryResultWindowSize should be ignored when rows=0 to allow us to run this query as fast as possible, even if this means not caching queryResultWindowSize docs -- because we'd rather not compute their scores merely to cache something in hopes it might be used.
I made a slight change to improve the variable name I added - c6618d6
The method you are attempting to modify calls getDocListAndSetNC when useFilterCache=false, which is well optimized for this.
Right but in that case - assuming you are referring to prior to my PR SolrIndexSearcher line 1703 / 1708, we don't get the optimizations from SOLR-14765. This means in the case of rows=0, we don't reuse the filter cache dotset and have to recompute it.
With my change, we never get down to getDocListAndSetNC for the case of rows=0 and can build the results right from the filterCache docset.
is that queryResultWindowSize should be ignored when rows=0 to allow us to run this query as fast as possible, even if this means not caching queryResultWindowSize docs -- because we'd rather not compute their scores merely to cache something in hopes it might be used.
100% agree with this idea. I'm not sure what is happening with queryResultWindowSize docs with my change. In theory we definitely shouldn't be computing them.
I addressed the queryResultWindowSize with 326afdd when rows=0.
You claim that your approach to this uses the filter cache (for q). Yet it does not because it calls getDocSetNC. I feel compelled to -1 this because that's what useFilterCache means. But we agree on the goals -- don't compute the score!
You claim that your approach to this uses the filter cache (for q)
With my change, for the case of rows=0, the filtercache docset will be used to compute the result using q.
I don't think I understand your concerns. getDocSetNC uses the filterCache docset and does not insert the query result into the filter cache. If getDocSetC is used, the query result will be inserted into the filter cache - which we don't want.
getDocSetNC uses the filterCache docset
Can you elaborate where this is so?
getDocSetNC uses the filterCache docset
Can you elaborate where this is so?
This existed prior to my change due to SOLR-14765 - SolrIndexSearcher before my PR lines 1658-1660 and with my PR lines 1673-1675.
Let's use GH links to ensure we understand each other. You are referring to this?
List<Query> filterList = cmd.getFilterList();
if (filterList != null && !filterList.isEmpty()) {
out.docSet = DocSetUtil.getDocSet(out.docSet.intersection(getDocSet(filterList)), this);
}
This shows that the list of filter queries goes through the filter cache. But that's true no matter what (other code paths ultimately lead to the same but in different methods deeper). The question at hand is how is the q param processed, which in this code gets turned into a DocSet saved to out.docSet. Immediately before these lines, it's populated via getDocSet. This is what useFilterCache means. In your PR, you added a condition to not do this, thus not use the filter cache for q.
Hi, I already responded on Slack, but I now found this PR and wanted to say some additional things:
There is also a separate collector and on top of that a separate method in IndexSearcher, which automatically does a lot of the top-docs related stuff. The problem with a normal TopDocs or TopFieldDocs collector is that the empty PQ has some additional overhead (although the score is not calculated and returned as 0).
First of all: SolrIndexSearcher should also implement support for using the optimized methods and collectors, unless a DocSet is needed (for facets/aggs). Using the TotalHitCountCollector is one thing to do (this needs to be done more down the line in the search code of Solr, but it's worth to do. But as there is now support to short-circuit count queries on the Query level, you should also think of using IndexSearcher#count() directly: E.g., this one returns IndexReader#numDocs() for MatchAllDocsQuery (thats a stupid simple case), but it can also return quick counts when there are no deleted docs in a segment and the statistics in index can answer the query.
The changes here look fine (although I can't follow the discussion about those crazy flags true/false/whatever), but in addition to the code here, in case of no TopDocs and no facets, the code should call IndexSearcher#count(). In the facet case, it should at least use the TotalHitCountCollector as the final sink.
@uschindler See getDocListNC which already seems to be pretty well optimized when lastDocRequested <= 0 (i.e. rows==0), and in conjunction also when !needScores. The reason some of the Lucene convenience collectors & IndexSearcher.count aren't used is because this code is juggling a multitude of scenarios, such as when there might be a so-called PostFilter, thus leading to another Collector to add to the head of the collectors. Care must be taken before adding yet another optimization -- does it actually add value or it just to feel good to use another Lucene class. getDocListAndSetNC follows this method and is similar.
Updated this based on talking w/ @dsmiley. the useFilterCache part I was confused on that it always applies to q so my change didn't make sense. I pushed a change that made it clear what useFilterCache is always used for q.
I also made the change to QueryComponent to make sure scores aren't calculated if rows=0. I think there is more here potentially.
TODO:
- Look at adding some more tests
- Try adding a benchmark for this
So you've coupled the SortSpec into ReturnFields just so that it can implement
wantsScorein a slightly more holistic way? Hmmm; that feels like bad coupling. Maybe a methodboolean needScores(int rows, ReturnFields, SortSpec)is needed to decide to apply GET_SCORES to flags and maybe for other cases?
yea I'm really not a fan of the approach. Still looking at better ways to do it.
I think I found a better way in 8f7cb64 - I didn't originally like the idea of parsing rows there, but since we are explicitly checking for a single value it isn't that bad. This makes it SolrReturnFields a lot cleaner and QueryComponent doesn't even need to change at all.
I pushed 80f11d5 which has a benchmark for checking a bunch of this. I haven't compared against main yet, but on my machine:
Benchmark (flScore) (matchAllDocs) (noRowsQuery) (singleShard) (sortScore) Mode Cnt Score Error Units
SearchOptimizations.query false false false false false thrpt 9 14117.797 ± 667.524 ops/s
SearchOptimizations.query false false false false true thrpt 9 12798.032 ± 664.581 ops/s
SearchOptimizations.query false false false true false thrpt 9 41903.764 ± 3499.993 ops/s
SearchOptimizations.query false false false true true thrpt 9 41875.468 ± 1274.265 ops/s
SearchOptimizations.query false false true false false thrpt 9 14165.919 ± 1275.410 ops/s
SearchOptimizations.query false false true false true thrpt 9 14658.515 ± 711.654 ops/s
SearchOptimizations.query false false true true false thrpt 9 46095.168 ± 2424.954 ops/s
SearchOptimizations.query false false true true true thrpt 9 45676.193 ± 2169.222 ops/s
SearchOptimizations.query false true false false false thrpt 9 12980.789 ± 379.131 ops/s
SearchOptimizations.query false true false false true thrpt 9 10658.171 ± 1223.202 ops/s
SearchOptimizations.query false true false true false thrpt 9 36985.569 ± 1949.741 ops/s
SearchOptimizations.query false true false true true thrpt 9 32990.086 ± 1658.271 ops/s
SearchOptimizations.query false true true false false thrpt 9 15341.764 ± 345.964 ops/s
SearchOptimizations.query false true true false true thrpt 9 14952.378 ± 812.817 ops/s
SearchOptimizations.query false true true true false thrpt 9 51607.454 ± 2001.172 ops/s
SearchOptimizations.query false true true true true thrpt 9 51498.706 ± 1305.913 ops/s
SearchOptimizations.query true false false false false thrpt 9 12525.537 ± 1012.328 ops/s
SearchOptimizations.query true false false false true thrpt 9 12463.681 ± 670.962 ops/s
SearchOptimizations.query true false false true false thrpt 9 40158.078 ± 2073.773 ops/s
SearchOptimizations.query true false false true true thrpt 9 38621.333 ± 2662.662 ops/s
SearchOptimizations.query true false true false false thrpt 9 14370.718 ± 1075.079 ops/s
SearchOptimizations.query true false true false true thrpt 9 14066.580 ± 376.911 ops/s
SearchOptimizations.query true false true true false thrpt 9 45014.503 ± 2190.095 ops/s
SearchOptimizations.query true false true true true thrpt 9 45708.595 ± 978.899 ops/s
SearchOptimizations.query true true false false false thrpt 9 12078.353 ± 221.499 ops/s
SearchOptimizations.query true true false false true thrpt 9 11039.752 ± 132.751 ops/s
SearchOptimizations.query true true false true false thrpt 9 33369.947 ± 448.742 ops/s
SearchOptimizations.query true true false true true thrpt 9 26665.048 ± 740.376 ops/s
SearchOptimizations.query true true true false false thrpt 9 15575.767 ± 398.453 ops/s
SearchOptimizations.query true true true false true thrpt 9 15068.308 ± 1095.587 ops/s
SearchOptimizations.query true true true true false thrpt 9 52237.705 ± 2442.391 ops/s
SearchOptimizations.query true true true true true thrpt 9 52827.441 ± 1061.696 ops/s
There is great work here; looking forward to it making 9.6 :-)
@risdenk are there any loose ends here or it mergeable as-is, notwithstanding a CHANGES.txt entry? Proposed CHANGES.txt under 9.6 optimizations:
SOLR-17131: Queries: rows=0, can skip sorting and doc collection. (Kevin Risden, David Smiley)
Shall I just merge this for 9.6? I think it's embarrassing we've had this perf issue for so long without it being noticed. match-all-docs is so common, like in ping queries for one.
One test failed, TestPrepRecovery.testLeaderNotResponding but it's a common flapper
CC @gus-asf (9.6 RM)
I guess the question is more a matter of how safe you think this change is... We all know that there are some flaky tests. The test doesn't sound related (being admin/recovery related not query related) doesn't matter much assuming it doesn't reproduce with the seed, or seem to fail more frequently.
I haven't published the RC yet, though I have already run tests locally and should re-do that if we are going to add new commits.
-Gus
On Mon, Apr 22, 2024 at 3:44 PM David Smiley @.***> wrote:
Shall I just merge this for 9.6? I think it's embarrassing we've had this perf issue for so long without it being noticed. match-all-docs is so common, like in ping queries for one.
One test failed, TestPrepRecovery.testLeaderNotResponding but it's a common flapper https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=solr-root&search.timeZoneId=America%2FNew_York&tests.container=org.apache.solr.cloud.TestPrepRecovery
CC @gus-asf https://github.com/gus-asf (9.6 RM)
— Reply to this email directly, view it on GitHub https://github.com/apache/solr/pull/2221#issuecomment-2070806945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF2L3IFCJ6F3JJSIZGGKKLY6VR73AVCNFSM6AAAAABCJOYDZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQHAYDMOJUGU . You are receiving this because you were mentioned.Message ID: @.***>
-- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)
I think the change is "safe" from a test stability standpoint.
There is one change to be mentioned in CHANGES.txt and probably in release notes. If rows=0 then maxScore will no longer be returned. The user must ask for at least one row if the score is desired.
So code that does something with that maxScore value could hit null/undefined? That's sort of iffy for back compatibility...
On Mon, Apr 22, 2024 at 4:51 PM David Smiley @.***> wrote:
I think the change is "safe" from a test stability standpoint.
There is one change to be mentioned in CHANGES.txt and probably in release notes. If rows=0 then maxScore will no longer be returned. The user must ask for at least one row if the score is desired.
— Reply to this email directly, view it on GitHub https://github.com/apache/solr/pull/2221#issuecomment-2070929291, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF2L3KHFCKC2CFZ6OHG3FTY6VZZXAVCNFSM6AAAAABCJOYDZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQHEZDSMRZGE . You are receiving this because you were mentioned.Message ID: @.***>
-- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)
I'll not push for this in 9.6. I confused it in part for the important work that made it to 9.1 SOLR-17131.
Kevin, I suppose an easy work-around for users is to explicitly set "fl" to something that does not include score. If fl doesn't include scores, then maxScore won't be included either.
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!
This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.
This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!