vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Sorting giving incorrect results

Open nehajatav opened this issue 3 months ago • 7 comments

Describe the bug Query1: The below gives 4000 lexid sorted by id field as expected { "hits" : 0, "model.searchPath" : "/0", "yql" : "select '[docid]' from sources * where !(range( myDate_t, -Infinity, Infinity )) AND (range(date,960892397000,1085589357000)) order by '[docid]' asc limit 4000 offset 0", "timeout" : "120s" } Query2: The below gives 3000 lexid sorted by id field { "hits" : 0, "model.searchPath" : "/0", "yql" : "select '[docid]' from sources * where !(range( myDate_t, -Infinity, Infinity )) AND (range(date,960892397000,1085589357000)) order by '[docid]' asc limit 3000 offset 0", "timeout" : "120s" } Below are the observations that we dont expect with default top-k-probability

  1. id ranked 1-2855 from output of Query1 are ranked same in Query2
  2. id ranked 2896-3027 from output of Query1 are ranked 2856-2887 in Query2
  3. id ranked 3082 from output of Query1 are ranked 2888 in Query2
  4. id ranked 3096-3107 from output of Query1 are ranked 2889-3000 in Query2

Expected behavior Ranking should nearly be the same for both queries

Environment (please complete the following information):

  • Rhel8
  • Podman

Vespa version 8.221.29

nehajatav avatar Mar 20 '24 09:03 nehajatav

Is this repeatable? Is coverage 100% in both cases? Could you try with top-k-probability set to 1.0?

bratseth avatar May 07 '24 14:05 bratseth

How do I set it to 1.0 without setting any value for max-hits-per-partition? <tuning><dispatch><top-k-probability>1.0</top-k-probability></dispatch><searchnode>.... Invalid XML according to XML schema, error in services.xml: element "top-k-probability" not allowed here; expected the element end-tag or element "max-hits-per-partition" [98:40]

<tuning><dispatch><max-hits-per-partition /><top-k-probability>1.0</top-k-probability></dispatch><searchnode>.... character content of element "max-hits-per-partition" invalid; must be an integer

nehajatav avatar May 07 '24 19:05 nehajatav

@bratseth Coverage is 100%, I have shared response with trace level with you over secure channel Also, unable to set max-hits-per-partition, see comment above

nehajatav avatar May 07 '24 19:05 nehajatav

This works just fine:

        <tuning>
            <dispatch>
                <top-k-probability>1.0</top-k-probability>
            </dispatch>
        </tuning>

bratseth avatar May 08 '24 10:05 bratseth

@nehajatav Could provide the output of the following command? The utility must be executed on a container node.

vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher.<insert content cluster name here> | grep topKProbability

There are also slightly different count for total number documents in the two dumps: Total documents "3k": 31056000 Total documents "4k": 31056022

The dumps provided indicates that the top-k setting has not been correctly propagated. There is a slightly skew in the distribution of hits, with the node 2 reporting more hits than 0 and 1. The slight change in ordering was caused by additional hits from node 2 that have a lexical ordering lower than the highest in the 3k dump.

bjorncs avatar May 08 '24 13:05 bjorncs

@bjorncs the total count may be due to increasing docs in the cluster @bratseth was able to push top-k 1.0 but still the same result This is the result even after convergence across all nodes with top k set to 1.0 [vespa@vespa-container-03 /]$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability topKProbability 0.9999 [vespa@vespa-container-03 /]$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability topKProbability 0.9999 [vespa@vespa-container-03 /]$

nehajatav avatar May 08 '24 20:05 nehajatav

@nehajatav The command you listed does not include the content cluster name as suffix to config id.

$ vespa-get-config -n vespa.config.search.dispatch -i feed/component/dispatcher. |grep topKProbability

You can use vespa-configproxy-cmd to determine the available config instances at a node:

$ vespa-configproxy-cmd | grep "feed/component/dispatcher"

Use the output to determine the exact arguments to vespa-get-config. If the config still contains 0.9999 the change to services.xml has not been applied.

bjorncs avatar May 14 '24 14:05 bjorncs