faiss
faiss copied to clipboard
Empty index inconsistency
Hello,
When searching more neighbours than there are samples in the population, one expects -1 returned indexes and infinite (before nan_to_num) distances. I tested this on empty indexes and it works as expected with a few samples but not above a certain population size:
>>> faiss.IndexFlatL2(1).search(np.random.random((19, 1)), 1) # Works fine
(array([[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38],
[3.4028235e+38]], dtype=float32), array([[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1],
[-1]]))
>>> faiss.IndexFlatL2(1).search(np.random.random((20, 1)), 1) # First inconsistency at 20 samples
(array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]], dtype=float32), array([[4606307992086394560],
[4604998639883125282],
[4600655086560625350],
[4598294196564123122],
[4604232847489403516],
[4606679014202959133],
[4602634729945304042],
[4606974893601702480],
[4598449504068727218],
[4599660145183339918],
[4589041727904493456],
[4586442628466231040],
[4606335022048302453],
[4597148622011556900],
[4606505784769942931],
[4592230883026519384],
[4603841237871291188],
[4601734269562976314],
[4595473575588231508],
[4606528106805786929]]))
With torch.Tensor inputs, it starts bugging at n_samples=21 and not 20.
May be related to #2135(?)
Presumably because the threshold on nb queries
https://github.com/facebookresearch/faiss/wiki/Implementation-notes#matrix-multiplication-to-do-many-l2-distance-computations
Probably the ids are not filled in in the BLAS case.