lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Remove unnecessary `AbstractKnnVectorQuery.exactSearch()`

Open mccullocht opened this issue 1 year ago • 4 comments

Description

As of #12806 the hnsw codec has implemented a more complete version of this logic that may trigger without a pre-filter query. exactSearch() might also have some odd behavior surrounding scalar quantization -- IIUC any segment in which we hit this path may score against the unquantized vectors which may then be mixed with quantized scored results.

This is related to #12505

mccullocht avatar Feb 28 '24 17:02 mccullocht

@mccullocht How does this work against segments that were created before 9.9? I know the new reader will support this. But my concern is the SPI loading Lucene95HNSWReader and that code not handling the under sampling.

Maybe we should add my change to the other backwards codec readers to make your change possible?

benwtrent avatar Feb 29 '24 15:02 benwtrent

@benwtrent Segments before 9.9 would perform a graph search. In 9c3679bf14b0d75cd58e1fda3ca4b8a76aa033b9 I back ported your change from #12806 to 9.2, 9.4, and 9.5 since it wasn't too much effort.

mccullocht avatar Feb 29 '24 17:02 mccullocht

@benwtrent I see a couple of possibilities to internalize this in KnnVectorsReader:

  • add a public abstract void exhaustiveSearch(field, target, knnCollector, acceptDocs) call that internalizes the exact search logic that lives in the query today.
  • add a public VectorScorer createVectorScorer(field, target) call to internalize creation of the VectorScorer and handling of the similarity function but otherwise leave much of the logic the same.

mccullocht avatar Mar 04 '24 16:03 mccullocht

add a public VectorScorer createVectorScorer(field, target) call to internalize creation of the VectorScorer and handling of the similarity function but otherwise leave much of the logic the same.

I personally prefer this. It should allow it to combine with other docset iterators. I am not 100% sure of the API/name/etc. But it would have to be added to the leafReader.

benwtrent avatar Mar 04 '24 17:03 benwtrent

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Mar 19 '24 00:03 github-actions[bot]

I was looking at this PR since it's marked as stale by our bots. I see that we've already added the fallback to exact search when k > maxOrd(), and we also have the createVectorScorer() that uses the similarity configured in FieldInfo.

Do we plan to make any other changes as part of this PR? If not, we can close this one.

vigyasharma avatar Apr 02 '24 18:04 vigyasharma

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Apr 17 '24 00:04 github-actions[bot]

I agree, I think it can be closed. We will still need the exactSearch method, even with refactorings that make it play well with quantization.

benwtrent avatar Apr 17 '24 12:04 benwtrent

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar May 02 '24 00:05 github-actions[bot]