faiss
faiss copied to clipboard
nscan include invalid poins when we have filter
https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.cpp#L596 When we want to calculate scan count and use it to early terminate the search by max_codes, it turns out the algorithm searches each bucket, the scan count adds all data in the bucket of index, by including some invalid points when we have filter. Is this as expected?
@mdouze hi can you take a look?
https://github.com/zilliztech/knowhere/pull/353
The nscan
does not take into account the filtering. Do you suggest to take into account only the vectors for which distances are actually computed?
Yeah, for example users want to use max_codes
to determine the search range, Let's say we have a index consisting 10000 iterms, and filter search(top100) filters out 90% items, it is likely that we get nothing returned and hard to tune. But if nscan means the valid count, we can always set a larger max_codes(>=topk) to get what we want.