vespa icon indicating copy to clipboard operation
vespa copied to clipboard

Predicate queries exclude documents influenced by other documents predicate value

Open adriabonetmrf opened this issue 3 years ago • 20 comments

Hello,

Sometimes when executing a query filtering by a predicate a document which should match isn't included in the result, removing one field from the sent query which isn't present on the predicate condition or changing its value results in the document appearing.

This has happened several times randomly and for different documents, let me explain one case:

Given the query:

select * from doc where siteId = 0 and predicate(show_condition,{\"visitorFrequency\":\"5\",
\"section\": [\"28961\"],"}, {});

A document A with a predicate field "show_condition": "section in [28961]" and a field "siteId": 0

Another document B with the predicate field "show_condition": "visitorFrequency in [4,5] and country in [US]" and a field "siteId": 1

The document A would not show in the results, unless the visitorFrequency field in the query predicate was removed or its value wasn't 4 or 5.

There were several ways that somehow fixed the issue:

  • Making a partial update on document A assigning the same value or another to the show_condition field
  • Making a whole update on document A sending the same document or modifying it
  • Removing the visitorFrequency value from document B predicate, effectively leaving it as "show_condition": "country in [US]"
  • Changing the visitorFrequency value from document B predicate

Other relevant information:

  • Query coverage was always 100
  • Container cluster has 3 nodes
  • Data distribution is flat
  • Arity on the problematic predicate field is set to 2
  • Couldn't find a way to replicate the issue

The problematic predicate in doc schema:

 field show_condition type predicate {
    indexing: summary | attribute
    index {
        arity: 2 # mandatory
    }
}

adriabonetmrf avatar Apr 19 '21 08:04 adriabonetmrf

Thanks for the report.

Make sure that you do not mix the concept of range and regular attributes/variables in predicate search. The predicates and query value are formatted differently (e.g. range attribute values in query are not wrapped with quotes). Range and regular attributes are handled differently in the underlying data structures. See https://docs.vespa.ai/en/predicate-fields.html for details. We need a reproducible case. Closing for now - please reopen if are able to reproduce it later.

bjorncs avatar Apr 20 '21 14:04 bjorncs

As far as I understand the problem is that you are not seeing documents that you would expect to match your query. And that this is not reproducible from one query to the next, and that updating documents changes the behaviour.

My guess here is that documents returned do not have defined order, and that you only fetch a subset of the documents that match the query.

1 - The query you refer to will show you up to 10 of the best documents, where best is defined by the ranking function. When no ranking is specified, it will use the builtin default rankprofile which is suitable for text search. For predicate fields I suspect you do not get much variation, and I expect all documents to have equal rank. You will see that as a relevance field in the returned result.

2 - Documents with equal relevance are treated as equally important, and they might be returned in any order. This is especially true when using multiple nodes, or multiple threads per query. So if your query matches 100 documents in total over all your nodes, you will still only see 10 unless you increase that by using the hits= query parameter to adjust it. Vespa only guarantee that there are no more hits with higher relevance than the ones we surface.

So you need to ensure a defined order for your documents either by defining a rank profile (preferred), or by using the order clause.

You see how many hits your query matched in the total-hits field.

baldersheim avatar Apr 21 '21 18:04 baldersheim

Thanks for the response @baldersheim, regarding the hit limit we send the parameter set to 10 and the query returned 2 documents while missing the third one, the total-hits was also 2 and after the modification of the document it turned 3 with the missing document also appearing in the result. Might be something related to indexs.

adriabonetmrf avatar Apr 22 '21 08:04 adriabonetmrf

Following up with the requested services.xml:

<?xml version='1.0' encoding='UTF-8'?>
<services version='1.0'>
    <admin version="2.0">
        <adminserver hostalias="app01" />
        <configservers>
            <configserver hostalias="app01"/>
            <configserver hostalias="app02"/>
            <configserver hostalias="app03"/>
        </configservers>
        <logserver hostalias="app01" />
        <cluster-controllers>
            <cluster-controller hostalias="app07" />
            <cluster-controller hostalias="app08" />
            <cluster-controller hostalias="app09" />
        </cluster-controllers>
    </admin>
    <container id='default' version='1.0'>
        <http>
            <server port="8080" id="main-server" />
        </http>
        <component id="com.my.app.services.ESLoggerServiceProvider" bundle="myapp" />

        <document-api/>

        <search>

            <chain id="itemsChain" inherits="vespa"/>

            <chain id="debugItemsChain" inherits="vespa">
                <searcher id="com.my.app.searcher.item.ESLoggerSearcher" bundle="myapp" after="MinimalQueryInserter"/>
            </chain>

        </search>
        
	    <document-processing>
            <chain id="default">
                <documentprocessor id="com.my.app.processor.DocumentDatesProcessor" bundle="myapp" />
                <documentprocessor id="com.my.app.processor.PredicatesProcessor" bundle="myapp" />
                <documentprocessor id="com.my.app.processor.DomainProcessor" bundle="myapp" />
            </chain>
        </document-processing>

        <handler id="com.my.app.handler.TransformationHandler" bundle="myapp">
            <binding>http://*/transformation</binding>
        </handler>



        <nodes jvmargs="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005">
            <node hostalias="app04"/>
            <node hostalias="app05"/>
            <node hostalias="app06"/>
        </nodes>
    </container>

    <content id='docs' version='1.0'>
        <redundancy>2</redundancy>
        <engine>
            <proton>
                <searchable-copies>2</searchable-copies>
            </proton>
        </engine>
        <search>
            <visibility-delay>1.0</visibility-delay>
        </search>
        <documents>
            <document mode="index" type='item' />
            <document mode='index' type='doc' />
        </documents>
        <nodes>
            <node hostalias="app07" distribution-key="0"/>
            <node hostalias="app08" distribution-key="1"/>
            <node hostalias="app09" distribution-key="2"/>
        </nodes>
    </content>

</services>

adriabonetmrf avatar Apr 28 '21 09:04 adriabonetmrf

The item (queried document) schema:

schema item {

    document item inherits root  {

        field url type string {
            indexing: summary | attribute
        }

        field state type string {
            indexing: summary | attribute
        }

        field render_properties type string {
            indexing: summary
        }

        field config_properties type string {
            indexing: summary
        }

        field transformation_id type string {
        	indexing: summary
        }
    }

    document-summary short {
        from-disk
        summary documentid type string {source: documentid}
        summary name type string {source: name}
        summary url type string {source: url}
        summary render_properties type string {source: render_properties}
        summary config_properties type string {source: config_properties}
        summary transformation_id type string {source: transformation_id}
    }

    rank-profile item inherits default {

        rank-properties {
            $contextFactor: 1
            $baseValueFactor: 1
        }

        function contextRank() {
            expression: dotProduct(context, context) * $contextFactor
        }

        function baseValue() {
            expression: if(isNan(attribute(base_value)), 0, attribute(base_value) * $baseValueFactor)
        }

        first-phase {
            expression: contextRank + baseValue
        }
    }

    rank-profile item_test inherits item {
        summary-features: contextRank baseValue
    }


adriabonetmrf avatar Apr 28 '21 09:04 adriabonetmrf

The parent document with the troublesome show_condition predicate field:

schema root {

    document root {

        field name type string {
            indexing: summary | attribute
        }

        field site_id type int {
            indexing: summary | attribute
        }

        field creation_date type string {
            indexing: summary
        }

        field creation_date_ts type long {
            indexing: summary | attribute
        }

        field last_update_date type string {
            indexing: summary
        }

        field last_update_date_ts type long {
            indexing: summary | attribute
        }

        field start_date type string {
            indexing: summary
        }

        field start_date_ts type long {
            indexing: summary | attribute
        }

        field end_date type string {
            indexing: summary
        }

        field end_date_ts type long {
            indexing: summary | attribute
        }

        field impressions type int {
            indexing: summary | attribute
        }

        field frequency_cap type int {
            indexing: summary | attribute
        }

        field context type weightedset<string> {
            indexing: summary | attribute
            attribute: fast-search
        }

        field base_value type int {
            indexing: summary | attribute
        }

        field show_condition type predicate {
            indexing: summary | attribute
            index {
                arity: 2  # mandatory
            }
        }
    }

}

adriabonetmrf avatar Apr 28 '21 09:04 adriabonetmrf

@adriabonetmrf Thanks for uploading your application definition. We were sadly unable to reproduce the problem ourself. We would therefore like you to deploy your application to our public cloud offering - Vespa Cloud. There is a free trail available; follow the steps in https://cloud.vespa.ai/en/getting-started-java. Let us know once you have deployed the app and fed the initial corpus. We will then debug your Vespa instance.

bjorncs avatar Apr 29 '21 10:04 bjorncs

@bjorncs would it be ok to grant you access to our cluster so you can there debug directly? I'm worried we won't be able to reproduce the issue in a different environment seeing that it was impossible to do so in a local environment.

adriabonetmrf avatar Apr 29 '21 14:04 adriabonetmrf

Hello again. seems like the issue have come back after a few days since we upgraded to the newest vespa version, searching the locks for the exact time in which our documents stopped appearing as results in the query we found out some logs that didn’t appear since a week ago, could you check it out to see if it may be related or briefly explain what these logs are about? Thanks

1620098831.090566	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  0 with feature 737524818909163148 and 2 bits set. Cost is 14660626.000000 = 28.84%, accumulated cost is 28.84%
1620098831.090683	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  1 with feature 17031333526373008878 and 5 bits set. Cost is 7089630.000000 = 13.95%, accumulated cost is 42.79%
1620098831.090724	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  2 with feature 8801664902519362374 and 9 bits set. Cost is 6762501.000000 = 13.30%, accumulated cost is 56.09%
1620098831.090755	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  3 with feature 8037804733027093536 and 1 bits set. Cost is 4795967.000000 = 9.43%, accumulated cost is 65.53%
1620098831.090785	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  4 with feature 8048949098004463281 and 2 bits set. Cost is 1829482.000000 = 3.60%, accumulated cost is 69.13%
1620098831.090816	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  5 with feature 9862133996467349584 and 2 bits set. Cost is 1827742.000000 = 3.60%, accumulated cost is 72.72%
1620098831.090846	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  6 with feature 17207324821137896222 and 1 bits set. Cost is 1708969.000000 = 3.36%, accumulated cost is 76.08%
1620098831.090876	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  7 with feature 2484938077526416915 and 3 bits set. Cost is 1423422.000000 = 2.80%, accumulated cost is 78.88%
1620098831.090906	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  8 with feature 282776678627017083 and 6 bits set. Cost is 1301052.000000 = 2.56%, accumulated cost is 81.44%
1620098831.090936	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector  9 with feature 3372166568900116344 and 6 bits set. Cost is 1265538.000000 = 2.49%, accumulated cost is 83.93%
1620098831.090990	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 10 with feature 13942585119210481324 and 1 bits set. Cost is 1240234.000000 = 2.44%, accumulated cost is 86.37%
1620098831.091021	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 11 with feature 9678520279973722975 and 2 bits set. Cost is 1166778.000000 = 2.30%, accumulated cost is 88.67%
1620098831.091050	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 12 with feature 7044223836620296644 and 1 bits set. Cost is 919821.000000 = 1.81%, accumulated cost is 90.48%
1620098831.091083	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 13 with feature 15573892562022170041 and 1 bits set. Cost is 754034.000000 = 1.48%, accumulated cost is 91.96%
1620098831.091127	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 14 with feature 6371947368893123853 and 1 bits set. Cost is 749766.000000 = 1.47%, accumulated cost is 93.44%
1620098831.091161	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 15 with feature 13048939876408782521 and 1 bits set. Cost is 508110.000000 = 1.00%, accumulated cost is 94.44%
1620098831.091189	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 16 with feature 409504518885290577 and 1 bits set. Cost is 420006.000000 = 0.83%, accumulated cost is 95.26%
1620098831.091222	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 17 with feature 16523961346739514681 and 1 bits set. Cost is 332214.000000 = 0.65%, accumulated cost is 95.91%
1620098831.091278	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 18 with feature 10787106752632958751 and 1 bits set. Cost is 299820.000000 = 0.59%, accumulated cost is 96.50%
1620098831.091334	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 19 with feature 16752025573156115618 and 1 bits set. Cost is 229886.000000 = 0.45%, accumulated cost is 96.96%
1620098831.091366	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 20 with feature 1180809061994227710 and 1 bits set. Cost is 192357.000000 = 0.38%, accumulated cost is 97.34%
1620098831.091394	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 21 with feature 7938472654710123358 and 2 bits set. Cost is 152866.000000 = 0.30%, accumulated cost is 97.64%
1620098831.091421	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 22 with feature 779872615320808786 and 1 bits set. Cost is 143183.000000 = 0.28%, accumulated cost is 97.92%
1620098831.091449	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 23 with feature 183849941919404644 and 1 bits set. Cost is 135020.000000 = 0.27%, accumulated cost is 98.18%
1620098831.091476	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 24 with feature 9404233084973315353 and 1 bits set. Cost is 116453.000000 = 0.23%, accumulated cost is 98.41%
1620098831.091503	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 25 with feature 8617795079053430841 and 1 bits set. Cost is 108673.000000 = 0.21%, accumulated cost is 98.63%
1620098831.091532	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 26 with feature 6447170130262369375 and 1 bits set. Cost is 83487.000000 = 0.16%, accumulated cost is 98.79%
1620098831.091587	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 27 with feature 14073624626031642043 and 1 bits set. Cost is 82704.000000 = 0.16%, accumulated cost is 98.95%
1620098831.091621	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 28 with feature 762361677743867803 and 1 bits set. Cost is 63350.000000 = 0.12%, accumulated cost is 99.08%
1620098831.091649	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 29 with feature 5367418029738614926 and 1 bits set. Cost is 58626.000000 = 0.12%, accumulated cost is 99.19%
1620098831.091675	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 30 with feature 17453539663769624270 and 1 bits set. Cost is 53275.000000 = 0.10%, accumulated cost is 99.30%
1620098831.091704	cupid08.mrf.io	349/36655	searchnode	proton.searchlib.common.bitvectorcache	info	Populating bitvector 31 with feature 10857103004786974448 and 1 bits set. Cost is 42201.000000 = 0.08%, accumulated cost is 99.38%

adriabonetmrf avatar May 05 '21 06:05 adriabonetmrf

The log output is statistics from an cache used by predicate search. The information listed there is normal. I'm a bit surprised that this has not been visible in previous Vespa version.

bjorncs avatar May 07 '21 15:05 bjorncs

Thanks @bjorncs, regarding allowing you to debug the vespa instance directly into our cluster, is it possible?

adriabonetmrf avatar May 10 '21 07:05 adriabonetmrf

Sorry we cannot do that. Please deploy to Vespa Cloud for Vespa Team to debug

kkraune avatar May 10 '21 07:05 kkraune

Hello @kkraune and @bjorncs we've finally succeeded replicating the issue on vespa cloud by pointing our traffic into a production cluster from a trial account, could we open a private channel so we can send you the information you require to debug? We have some queries showing the issue.

adriabonetmrf avatar May 12 '21 13:05 adriabonetmrf

Hi! That is great news! Can you send an email to [email protected] and we can take it from there?

kkraune avatar May 12 '21 13:05 kkraune

The fix is part of 7.419.32

baldersheim avatar Jun 11 '21 11:06 baldersheim

Hello @kkraune, the issue seems to have appeared again at release 7.450.12-1 for some documents, same behaviour with it being solved by updating the document

adriabonetmrf avatar Sep 02 '21 08:09 adriabonetmrf

@adriabonetmrf Are you able to reproduce this in a deterministic way ? Both the edge fixed and in #18212, and the one in #18637 fixed in #18714 were quite subtle and required very targeted testing. It would be very helpful with reproducible test.

baldersheim avatar Sep 02 '21 10:09 baldersheim

Sadly we couldn't find a way to reproduce it other than swapping traffic to another instance of vespa until the issue appeared for some documents. We haven't done it this time around though, we just fixed the documents and reported back here.

adriabonetmrf avatar Sep 02 '21 10:09 adriabonetmrf

I'm not sure if it's correlated but we updated the schema since then with the use of a tensor, it's used in the ranking expression thought so I cannot see how it could affect.

We compare a tensor in the document with a one one of the same time provided by the query request.

field in the schema:

field user_frequency_cap type tensor<float>(id{}, freq{}) {
      indexing: summary | attribute
}

ranking expression:

        function evalFrequencyCap(frequencyCap, impressions){
            expression: if(frequencyCap > 0, if(impressions > 0, if(frequencyCap > impressions, 1, 0), 1), 1)
        }

        function userFrequencyCapMonth(){
            expression: evalFrequencyCap(attribute(user_frequency_cap){freq:month}, query(user_impressions){freq:month})
        }

        function userFrequencyCapWeek(){
            expression: evalFrequencyCap(attribute(user_frequency_cap){freq:week}, query(user_impressions){freq:week})
        }

        function userFrequencyCapDay(){
            expression: evalFrequencyCap(attribute(user_frequency_cap){freq:day}, query(user_impressions){freq:day})
        }

        function userFrequencyCapLifetime(){
            expression: evalFrequencyCap(attribute(user_frequency_cap){freq:lifetime}, query(user_impressions){freq:lifetime})
        }

        function userFrequencyCap() {
            expression: userFrequencyCapMonth * userFrequencyCapWeek * userFrequencyCapDay * userFrequencyCapLifetime
        }

        first-phase {
            rank-score-drop-limit: -1.0
            expression: if(userFrequencyCap == 0, -1, baseRank)
        }

adriabonetmrf avatar Sep 06 '21 12:09 adriabonetmrf

@baldersheim can we provide anything specific that might help you out? I haven't been able to reproduce it but it seems to be related to an edge case whenever the predicate terms are indexed seeing as it produces randomly (well, more like unknown circumstances)

adriabonetmrf avatar Sep 08 '21 10:09 adriabonetmrf

@adriabonetmrf: Please re-open if you are able to reproduce on Vespa 8.

bjorncs avatar Aug 31 '22 07:08 bjorncs