vespa Bug with vespa when using a linear combination score of two embeddings and varying hits=k?

Bug with vespa when using a linear combination score of two embeddings and varying hits=k?

Open drei34 opened this issue 8 months ago • 10 comments

Hello,

I am seeing some behavior I can't explain. We have a query which tries to pull relevant documents from a DB given two embeddings per document (and at runtime, we get two embeddings for this query). To do this, we use a rank profile which uses a weighted average (50/50 say) and there is an OrItem and disjunction in the Java code. However, when I vary hits=10 or hits=1000 and use this rank profile I see complete different document ids being retrieved. We can't explain this. Have people seen this before and is this something with an easy explanation? If not I can post more, but I'd probably need to create an example from MS Marco etc since this is propriety data. Does hits affect the set of initial documents retrieved (then the results makes sense)? However, the documentation makes me think hits just limits the output length but it seems to be doing more.

Jun 04 '24 20:06 drei34

vespa vespa copied to clipboard

Bug with vespa when using a linear combination score of two embeddings and varying hits=k?

vespa
vespa copied to clipboard