MSVBASE icon indicating copy to clipboard operation
MSVBASE copied to clipboard

Question about the implementation of VBase's iterative search in HNSW

Open cyccbxhl opened this issue 7 months ago • 0 comments

I'm studying the implementation of VBase on HNSW and I have some questions regarding the iterative search mechanism. In the original searchBaseLayerST function, which is responsible for searching the top-K vectors from the L0 layer, a max-heap (top_candidates) is used to maintain the best candidates(its size is ef). This approach naturally results in the first vector dequeued from candidate_set not necessarily being part of the final top-K results.

Image

In the current implementation of searchBaseLayerSTIterative, it seems that top_candidates has been removed entirely. Instead, the first vector dequeued from candidate_set is passed upward directly, and appears to be handled in hnsw_gettuple. From what I can see (ignoring some magic numbers I'm not fully clear on), in the branch with ORDER BY, it seems like the very first vector dequeued from candidate_set in searchBaseLayerSTIterative is returned directly. Doesn't this risk returning an incorrect or suboptimal result?

Image Image

Also, according to the HNSW paper, there is a mechanism involving a queue of recently visited vectors, where the median distance is used as one of the termination conditions during traversal(so and filter condition). I couldn't find this logic implemented in the code.

Thanks in advance for your clarification!

cyccbxhl avatar May 13 '25 07:05 cyccbxhl