hnswlib
hnswlib copied to clipboard
Why is the self point not in the knn_query output list?
Hello,
I want to get the nearest neighbours for each point in the dataset. I notice from the results of knn_query
that the neighbour list for some points does not contain their own index while most points have their own index as the first nearest neighbour. My code is given below
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
p.set_ef(10)
p.set_num_threads(12)
p.add_items(input_2d_vector)
nn = 15
labels, distance = p.knn_query(input_2d_vector, k=nn)
Example: result of a query index 159954
p.knn_query(input_2d_vector[159954], k=nn)
(array([[100278, 98287, 56307, 91682, 106717, 108968, 35750, 116215,
133216, 108053, 50169, 138988, 23028, 23627, 127306]],
dtype=uint64),
array([[624.35284, 628.1655 , 643.80225, 646.7606 , 649.9992 , 658.15686,
659.7333 , 659.9536 , 660.69086, 662.9651 , 667.3906 , 670.5628 ,
684.00586, 686.38666, 688.42053]], dtype=float32))
Ideally, the first index should be 159954 instead of 100278 given that the self distance of 159954 is zero. Is it an expected behavior or am I missing something?
Any help is much appreciated.
Thank you.