lucene
lucene copied to clipboard
Remove unnecessary `AbstractKnnVectorQuery.exactSearch()`
Description
As of #12806 the hnsw codec has implemented a more complete version of this logic that may trigger without a pre-filter query. exactSearch() might also have some odd behavior surrounding scalar quantization -- IIUC any segment in
which we hit this path may score against the unquantized vectors which may then be mixed with quantized scored results.
This is related to #12505
@mccullocht How does this work against segments that were created before 9.9? I know the new reader will support this. But my concern is the SPI loading Lucene95HNSWReader and that code not handling the under sampling.
Maybe we should add my change to the other backwards codec readers to make your change possible?
@benwtrent Segments before 9.9 would perform a graph search. In 9c3679bf14b0d75cd58e1fda3ca4b8a76aa033b9 I back ported your change from #12806 to 9.2, 9.4, and 9.5 since it wasn't too much effort.
@benwtrent I see a couple of possibilities to internalize this in KnnVectorsReader:
- add a
public abstract void exhaustiveSearch(field, target, knnCollector, acceptDocs)call that internalizes the exact search logic that lives in the query today. - add a
public VectorScorer createVectorScorer(field, target)call to internalize creation of theVectorScorerand handling of the similarity function but otherwise leave much of the logic the same.
add a public VectorScorer createVectorScorer(field, target) call to internalize creation of the VectorScorer and handling of the similarity function but otherwise leave much of the logic the same.
I personally prefer this. It should allow it to combine with other docset iterators. I am not 100% sure of the API/name/etc. But it would have to be added to the leafReader.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
I was looking at this PR since it's marked as stale by our bots. I see that we've already added the fallback to exact search when k > maxOrd(), and we also have the createVectorScorer() that uses the similarity configured in FieldInfo.
Do we plan to make any other changes as part of this PR? If not, we can close this one.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!
I agree, I think it can be closed. We will still need the exactSearch method, even with refactorings that make it play well with quantization.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!