faiss icon indicating copy to clipboard operation
faiss copied to clipboard

nscan include invalid poins when we have filter

Open chasingegg opened this issue 1 year ago • 4 comments

https://github.com/facebookresearch/faiss/blob/main/faiss/IndexIVF.cpp#L596 When we want to calculate scan count and use it to early terminate the search by max_codes, it turns out the algorithm searches each bucket, the scan count adds all data in the bucket of index, by including some invalid points when we have filter. Is this as expected?

chasingegg avatar Jan 12 '24 07:01 chasingegg

@mdouze hi can you take a look?

chasingegg avatar Jan 12 '24 07:01 chasingegg

https://github.com/zilliztech/knowhere/pull/353

alexanderguzhva avatar Jan 19 '24 20:01 alexanderguzhva

The nscan does not take into account the filtering. Do you suggest to take into account only the vectors for which distances are actually computed?

mdouze avatar Jan 29 '24 12:01 mdouze

Yeah, for example users want to use max_codes to determine the search range, Let's say we have a index consisting 10000 iterms, and filter search(top100) filters out 90% items, it is likely that we get nothing returned and hard to tune. But if nscan means the valid count, we can always set a larger max_codes(>=topk) to get what we want.

chasingegg avatar Jan 31 '24 03:01 chasingegg