solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-17131: Optimize rows=0 since score/sort isn't necessary

Open risdenk opened this issue 1 year ago • 29 comments

https://issues.apache.org/jira/browse/SOLR-17131

Description

If a query has rows=0 Solr does not need to sort the results or even compute the score. This means the filterCache can be more efficient to compute the result and Solr doesn't spend time on unnecessarily sorting and scoring large result sets for it to be thrown away.

There is one subtle difference from this change - maxScore is no longer returned if rows=0 which seems like a reasonable tradeoff. If maxScore is needed then set rows=1. maxScore can't reliably be compared across searches anyway so no reason to have it when rows=0.

This was found while looking at https://issues.apache.org/jira/browse/SOLR-14765 and trying to reason through the case of rows=0 that are sometimes used to just determine count of matching results. This change adds the check for rows=0 before other checks to ensure that we can short circuit scoring even if there is a scoring query. We also don't add the query to the filter cache since that could result in filter cache thrashing.

In some tests, we found that queries with rows=0 p99 improved by 50% and CPU dropped by 5%.

Tests

Updated tests in TestMainQueryCaching to account for this optimization.

Checklist

Please review the following and check all that apply:

  • [x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [x] I have created a Jira issue and added the issue ID to my pull request title.
  • [x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [x] I have developed this patch against the main branch.
  • [x] I have run ./gradlew check.
  • [x] I have added tests for my changes.
  • [ ] I have added documentation for the Reference Guide

risdenk avatar Jan 24 '24 21:01 risdenk

does not need to sort the results or even compute the score

Is there a unit test you can recommend that I might run in a debugger looking at key spots, that would show that a score is being computed wastefully? I can't imagine that there is time wasted sorting (as you claim) because there is nothing to sort.

This means the filterCache can be more efficient to compute the result and Solr doesn't spend time on unnecessarily sorting and scoring large result sets for it to be thrown away.

This seems contradictory. If the filterCache can compute the result, then wouldn't disabling the filterCache for the query (as I see you do in the PR) not use the filter cache then? I'm referring to the query, not the filters (which are processed the same either way).

dsmiley avatar Jan 27 '24 03:01 dsmiley

Is there a unit test you can recommend that I might run in a debugger looking at key spots, that would show that a score is being computed wastefully?

TestMainQueryCaching#testConstantScoreFlScore and put a breakpoint at SolrIndexSearcher line 1620

If there is rows=0 and one of three cases, the scores are computed and sorted even though its not needed:

  • fl includes score
  • sort includes score
  • q is a non constant query

If you walk through the existing if checks in SolrIndexSearcher line 1620, the first case isn't chosen since scores are requested. The second case doesn't match since its not a *:* or MatchAllDocsQuery - even if its a constant score query useFilterForSortedQuery is false and we don't really want to enable (and its false by default) that case (puts the q in the filter cache - related to your other question I'll answer below). The final case is useFilterCacheForDynamicScoreQuery which if useFilterForSortedQuery is false (which it is by default) we skip optimizations.

The changes to TestMainQueryCaching also show the other cases where the rows=0 check improves things. Avoids a full sort.

This seems contradictory. If the filterCache can compute the result, then wouldn't disabling the filterCache for the query (as I see you do in the PR) not use the filter cache then? I'm referring to the query, not the filters (which are processed the same either way).

So the check I added makes sure we don't add the q query to the filterCache. We want to use the filterCache to compute the result, but don't want to store the q in the filterCache after its computed. This is similar to useFilterForSortedQuery but this is specifically the case of a rows=0 query.

risdenk avatar Jan 27 '24 19:01 risdenk

I see what's happening. The queryResultCache's queryResultWindowSize is forcing the actual number of documents to be raised from 0 to that window (which is 1 in this test config), thus we have to score docs to find the one best doc.

I don't agree with your solution because it confuses what useFilterCache fundamentally means (use the filter cache for q). The other code path is actually already optimized for rows==0 and no scores; it should be quite fast!

I think the solution is to not save the results into queryResultCache when rows==0; thus the window won't be applied either. But I don't think SolrIndexSearcher should set this, I think it should be in QueryComponent by setting NO_SET_QCACHE so that some plugin/customization can easily make this decision for itself. Additionally, QueryComponent is setting GET_SCORES if fl contains the score but I think this should be further guarded by rows != 0.

dsmiley avatar Jan 28 '24 07:01 dsmiley

I also think a queryResultWindowSize of 1 (which is the default if it isn't specified in solrconfig and isn't documented AFAICT) should not bump up a request for no rows to a window of a paltry 1 doc. I don't think that was intended; I think what was intended was to effectively have no window if it isn't specified. I can see using a default of 1 seems to convey that (for typical queries that have results) but not for rows=0 case. Likely an oversight long ago.

dsmiley avatar Jan 28 '24 19:01 dsmiley

The other code path is actually already optimized for rows==0 and no scores

Which code path are you talking about?

I think the solution is to not save the results into queryResultCache when rows==0; thus the window won't be applied either.

Not sure I agree with that. We should be able to store the result in the queryResultCache so that we don't have to recompute for the same exact query a second time?

I also think a queryResultWindowSize of 1 (which is the default if it isn't specified in solrconfig and isn't documented AFAICT)

Its documented here https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#queryresultwindowsize-element

I know in our case we do set queryResultWindowSize in solrconfig.xml.

Either way I'll look into QueryComponent some. @dsmiley as far as you are aware this change even though maybe not the best place to put it - doesn't have any negative side effects right?

risdenk avatar Jan 29 '24 13:01 risdenk

(Me) The other code path is actually already optimized for rows==0 and no scores

The method you are attempting to modify calls getDocListAndSetNC when useFilterCache=false, which is well optimized for this.

Not sure I agree with that. We should be able to store the result in the queryResultCache so that we don't have to recompute for the same exact query a second time?

Okay; I sympathize. Another alteration to my suggestion, is that queryResultWindowSize should be ignored when rows=0 to allow us to run this query as fast as possible, even if this means not caching queryResultWindowSize docs -- because we'd rather not compute their scores merely to cache something in hopes it might be used.

dsmiley avatar Jan 29 '24 14:01 dsmiley

I made a slight change to improve the variable name I added - c6618d6

The method you are attempting to modify calls getDocListAndSetNC when useFilterCache=false, which is well optimized for this.

Right but in that case - assuming you are referring to prior to my PR SolrIndexSearcher line 1703 / 1708, we don't get the optimizations from SOLR-14765. This means in the case of rows=0, we don't reuse the filter cache dotset and have to recompute it.

With my change, we never get down to getDocListAndSetNC for the case of rows=0 and can build the results right from the filterCache docset.

risdenk avatar Jan 29 '24 15:01 risdenk

is that queryResultWindowSize should be ignored when rows=0 to allow us to run this query as fast as possible, even if this means not caching queryResultWindowSize docs -- because we'd rather not compute their scores merely to cache something in hopes it might be used.

100% agree with this idea. I'm not sure what is happening with queryResultWindowSize docs with my change. In theory we definitely shouldn't be computing them.

risdenk avatar Jan 29 '24 15:01 risdenk

I addressed the queryResultWindowSize with 326afdd when rows=0.

risdenk avatar Jan 29 '24 15:01 risdenk

You claim that your approach to this uses the filter cache (for q). Yet it does not because it calls getDocSetNC. I feel compelled to -1 this because that's what useFilterCache means. But we agree on the goals -- don't compute the score!

dsmiley avatar Jan 29 '24 15:01 dsmiley

You claim that your approach to this uses the filter cache (for q)

With my change, for the case of rows=0, the filtercache docset will be used to compute the result using q.

I don't think I understand your concerns. getDocSetNC uses the filterCache docset and does not insert the query result into the filter cache. If getDocSetC is used, the query result will be inserted into the filter cache - which we don't want.

risdenk avatar Jan 29 '24 15:01 risdenk

getDocSetNC uses the filterCache docset

Can you elaborate where this is so?

dsmiley avatar Jan 29 '24 15:01 dsmiley

getDocSetNC uses the filterCache docset

Can you elaborate where this is so?

This existed prior to my change due to SOLR-14765 - SolrIndexSearcher before my PR lines 1658-1660 and with my PR lines 1673-1675.

risdenk avatar Jan 29 '24 15:01 risdenk

Let's use GH links to ensure we understand each other. You are referring to this?

        List<Query> filterList = cmd.getFilterList();
        if (filterList != null && !filterList.isEmpty()) {
          out.docSet = DocSetUtil.getDocSet(out.docSet.intersection(getDocSet(filterList)), this);
        }

This shows that the list of filter queries goes through the filter cache. But that's true no matter what (other code paths ultimately lead to the same but in different methods deeper). The question at hand is how is the q param processed, which in this code gets turned into a DocSet saved to out.docSet. Immediately before these lines, it's populated via getDocSet. This is what useFilterCache means. In your PR, you added a condition to not do this, thus not use the filter cache for q.

dsmiley avatar Jan 29 '24 17:01 dsmiley

Hi, I already responded on Slack, but I now found this PR and wanted to say some additional things:

There is also a separate collector and on top of that a separate method in IndexSearcher, which automatically does a lot of the top-docs related stuff. The problem with a normal TopDocs or TopFieldDocs collector is that the empty PQ has some additional overhead (although the score is not calculated and returned as 0).

First of all: SolrIndexSearcher should also implement support for using the optimized methods and collectors, unless a DocSet is needed (for facets/aggs). Using the TotalHitCountCollector is one thing to do (this needs to be done more down the line in the search code of Solr, but it's worth to do. But as there is now support to short-circuit count queries on the Query level, you should also think of using IndexSearcher#count() directly: E.g., this one returns IndexReader#numDocs() for MatchAllDocsQuery (thats a stupid simple case), but it can also return quick counts when there are no deleted docs in a segment and the statistics in index can answer the query.

The changes here look fine (although I can't follow the discussion about those crazy flags true/false/whatever), but in addition to the code here, in case of no TopDocs and no facets, the code should call IndexSearcher#count(). In the facet case, it should at least use the TotalHitCountCollector as the final sink.

uschindler avatar Jan 29 '24 17:01 uschindler

@uschindler See getDocListNC which already seems to be pretty well optimized when lastDocRequested <= 0 (i.e. rows==0), and in conjunction also when !needScores. The reason some of the Lucene convenience collectors & IndexSearcher.count aren't used is because this code is juggling a multitude of scenarios, such as when there might be a so-called PostFilter, thus leading to another Collector to add to the head of the collectors. Care must be taken before adding yet another optimization -- does it actually add value or it just to feel good to use another Lucene class. getDocListAndSetNC follows this method and is similar.

dsmiley avatar Jan 29 '24 19:01 dsmiley

Updated this based on talking w/ @dsmiley. the useFilterCache part I was confused on that it always applies to q so my change didn't make sense. I pushed a change that made it clear what useFilterCache is always used for q.

I also made the change to QueryComponent to make sure scores aren't calculated if rows=0. I think there is more here potentially.

TODO:

  • Look at adding some more tests
  • Try adding a benchmark for this

risdenk avatar Jan 29 '24 22:01 risdenk

So you've coupled the SortSpec into ReturnFields just so that it can implement wantsScore in a slightly more holistic way? Hmmm; that feels like bad coupling. Maybe a method boolean needScores(int rows, ReturnFields, SortSpec) is needed to decide to apply GET_SCORES to flags and maybe for other cases?

yea I'm really not a fan of the approach. Still looking at better ways to do it.

risdenk avatar Jan 30 '24 20:01 risdenk

I think I found a better way in 8f7cb64 - I didn't originally like the idea of parsing rows there, but since we are explicitly checking for a single value it isn't that bad. This makes it SolrReturnFields a lot cleaner and QueryComponent doesn't even need to change at all.

risdenk avatar Jan 30 '24 21:01 risdenk

I pushed 80f11d5 which has a benchmark for checking a bunch of this. I haven't compared against main yet, but on my machine:

Benchmark                  (flScore)  (matchAllDocs)  (noRowsQuery)  (singleShard)  (sortScore)   Mode  Cnt      Score      Error  Units
SearchOptimizations.query      false           false          false          false        false  thrpt    9  14117.797 ±  667.524  ops/s
SearchOptimizations.query      false           false          false          false         true  thrpt    9  12798.032 ±  664.581  ops/s
SearchOptimizations.query      false           false          false           true        false  thrpt    9  41903.764 ± 3499.993  ops/s
SearchOptimizations.query      false           false          false           true         true  thrpt    9  41875.468 ± 1274.265  ops/s
SearchOptimizations.query      false           false           true          false        false  thrpt    9  14165.919 ± 1275.410  ops/s
SearchOptimizations.query      false           false           true          false         true  thrpt    9  14658.515 ±  711.654  ops/s
SearchOptimizations.query      false           false           true           true        false  thrpt    9  46095.168 ± 2424.954  ops/s
SearchOptimizations.query      false           false           true           true         true  thrpt    9  45676.193 ± 2169.222  ops/s
SearchOptimizations.query      false            true          false          false        false  thrpt    9  12980.789 ±  379.131  ops/s
SearchOptimizations.query      false            true          false          false         true  thrpt    9  10658.171 ± 1223.202  ops/s
SearchOptimizations.query      false            true          false           true        false  thrpt    9  36985.569 ± 1949.741  ops/s
SearchOptimizations.query      false            true          false           true         true  thrpt    9  32990.086 ± 1658.271  ops/s
SearchOptimizations.query      false            true           true          false        false  thrpt    9  15341.764 ±  345.964  ops/s
SearchOptimizations.query      false            true           true          false         true  thrpt    9  14952.378 ±  812.817  ops/s
SearchOptimizations.query      false            true           true           true        false  thrpt    9  51607.454 ± 2001.172  ops/s
SearchOptimizations.query      false            true           true           true         true  thrpt    9  51498.706 ± 1305.913  ops/s
SearchOptimizations.query       true           false          false          false        false  thrpt    9  12525.537 ± 1012.328  ops/s
SearchOptimizations.query       true           false          false          false         true  thrpt    9  12463.681 ±  670.962  ops/s
SearchOptimizations.query       true           false          false           true        false  thrpt    9  40158.078 ± 2073.773  ops/s
SearchOptimizations.query       true           false          false           true         true  thrpt    9  38621.333 ± 2662.662  ops/s
SearchOptimizations.query       true           false           true          false        false  thrpt    9  14370.718 ± 1075.079  ops/s
SearchOptimizations.query       true           false           true          false         true  thrpt    9  14066.580 ±  376.911  ops/s
SearchOptimizations.query       true           false           true           true        false  thrpt    9  45014.503 ± 2190.095  ops/s
SearchOptimizations.query       true           false           true           true         true  thrpt    9  45708.595 ±  978.899  ops/s
SearchOptimizations.query       true            true          false          false        false  thrpt    9  12078.353 ±  221.499  ops/s
SearchOptimizations.query       true            true          false          false         true  thrpt    9  11039.752 ±  132.751  ops/s
SearchOptimizations.query       true            true          false           true        false  thrpt    9  33369.947 ±  448.742  ops/s
SearchOptimizations.query       true            true          false           true         true  thrpt    9  26665.048 ±  740.376  ops/s
SearchOptimizations.query       true            true           true          false        false  thrpt    9  15575.767 ±  398.453  ops/s
SearchOptimizations.query       true            true           true          false         true  thrpt    9  15068.308 ± 1095.587  ops/s
SearchOptimizations.query       true            true           true           true        false  thrpt    9  52237.705 ± 2442.391  ops/s
SearchOptimizations.query       true            true           true           true         true  thrpt    9  52827.441 ± 1061.696  ops/s

risdenk avatar Jan 31 '24 20:01 risdenk

There is great work here; looking forward to it making 9.6 :-)

dsmiley avatar Mar 23 '24 05:03 dsmiley

@risdenk are there any loose ends here or it mergeable as-is, notwithstanding a CHANGES.txt entry? Proposed CHANGES.txt under 9.6 optimizations:

SOLR-17131: Queries: rows=0, can skip sorting and doc collection. (Kevin Risden, David Smiley)

dsmiley avatar Apr 18 '24 13:04 dsmiley

Shall I just merge this for 9.6? I think it's embarrassing we've had this perf issue for so long without it being noticed. match-all-docs is so common, like in ping queries for one.

One test failed, TestPrepRecovery.testLeaderNotResponding but it's a common flapper

CC @gus-asf (9.6 RM)

dsmiley avatar Apr 22 '24 19:04 dsmiley

I guess the question is more a matter of how safe you think this change is... We all know that there are some flaky tests. The test doesn't sound related (being admin/recovery related not query related) doesn't matter much assuming it doesn't reproduce with the seed, or seem to fail more frequently.

I haven't published the RC yet, though I have already run tests locally and should re-do that if we are going to add new commits.

-Gus

On Mon, Apr 22, 2024 at 3:44 PM David Smiley @.***> wrote:

Shall I just merge this for 9.6? I think it's embarrassing we've had this perf issue for so long without it being noticed. match-all-docs is so common, like in ping queries for one.

One test failed, TestPrepRecovery.testLeaderNotResponding but it's a common flapper https://ge.apache.org/scans/tests?search.relativeStartTime=P28D&search.rootProjectNames=solr-root&search.timeZoneId=America%2FNew_York&tests.container=org.apache.solr.cloud.TestPrepRecovery

CC @gus-asf https://github.com/gus-asf (9.6 RM)

— Reply to this email directly, view it on GitHub https://github.com/apache/solr/pull/2221#issuecomment-2070806945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF2L3IFCJ6F3JJSIZGGKKLY6VR73AVCNFSM6AAAAABCJOYDZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQHAYDMOJUGU . You are receiving this because you were mentioned.Message ID: @.***>

-- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)

gus-asf avatar Apr 22 '24 20:04 gus-asf

I think the change is "safe" from a test stability standpoint.

There is one change to be mentioned in CHANGES.txt and probably in release notes. If rows=0 then maxScore will no longer be returned. The user must ask for at least one row if the score is desired.

dsmiley avatar Apr 22 '24 20:04 dsmiley

So code that does something with that maxScore value could hit null/undefined? That's sort of iffy for back compatibility...

On Mon, Apr 22, 2024 at 4:51 PM David Smiley @.***> wrote:

I think the change is "safe" from a test stability standpoint.

There is one change to be mentioned in CHANGES.txt and probably in release notes. If rows=0 then maxScore will no longer be returned. The user must ask for at least one row if the score is desired.

— Reply to this email directly, view it on GitHub https://github.com/apache/solr/pull/2221#issuecomment-2070929291, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF2L3KHFCKC2CFZ6OHG3FTY6VZZXAVCNFSM6AAAAABCJOYDZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQHEZDSMRZGE . You are receiving this because you were mentioned.Message ID: @.***>

-- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)

gus-asf avatar Apr 22 '24 21:04 gus-asf

I'll not push for this in 9.6. I confused it in part for the important work that made it to 9.1 SOLR-17131.

Kevin, I suppose an easy work-around for users is to explicitly set "fl" to something that does not include score. If fl doesn't include scores, then maxScore won't be included either.

dsmiley avatar Apr 22 '24 23:04 dsmiley

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Jun 26 '24 00:06 github-actions[bot]

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

github-actions[bot] avatar Oct 07 '24 00:10 github-actions[bot]

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

github-actions[bot] avatar Dec 08 '24 00:12 github-actions[bot]