knowhere icon indicating copy to clipboard operation
knowhere copied to clipboard

remove override range_search from hnsw, use iterator-based instead

Open alwayslove2013 opened this issue 7 months ago • 5 comments

alwayslove2013 avatar May 21 '25 10:05 alwayslove2013

@alwayslove2013 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

mergify[bot] avatar May 21 '25 10:05 mergify[bot]

/kind improvement

alwayslove2013 avatar May 21 '25 10:05 alwayslove2013

@alwayslove2013 Do I get it correct that we're deprecating the range search completely?

alexanderguzhva avatar May 21 '25 14:05 alexanderguzhva

@alexanderguzhva In knowhere, there are two approaches to range_search. One is an internally implemented range_search within a specific index (override, like ivf / faiss_hnsw), while the other is a unified method from the parent class (index_node) that utilizes an iterator. This iterator continuously calls next() to collect the best results that meet the specified range. https://github.com/zilliztech/knowhere/blob/8a705a0a39fecb8fb94f8edee9ab964cb0c22566/include/knowhere/index/index_node.h#L155-L171

It's a historical misalignment between milvus and knowhere. Previously, it was defined that knowhere would return all results that fit within the range. However, Milvus does not require all results and operates under a k limit (which we can name range_search_k). If the specified range is too large while range_search_k is relatively small, it can lead to significant inefficiencies. The iterator-based range_search can appropriately halt when sufficient results are collected.

alwayslove2013 avatar May 22 '25 01:05 alwayslove2013

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alwayslove2013, chasingegg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar Jun 12 '25 02:06 sre-ci-robot