milvus icon indicating copy to clipboard operation
milvus copied to clipboard

feat: Support Search By PK

Open liliu-z opened this issue 1 week ago • 11 comments

issue: #39157

Overview: Support search by PK by resolving IDs to vectors on Proxy side. Upgrade go-api to adapt to new proto definitions.

Design:

  • Upgrade milvus-proto/go-api to latest master.
  • Implement handleIfSearchByPK in Proxy: resolve IDs to vectors via internal Query, then rewrite SearchRequest.
  • Adapt to 'SearchInput' oneof field in SearchRequest across client and handlers.
  • Fix binary vector stride calculation bug in placeholder utils.

Compatibility:

  • Old Pymilvus can still work w/o this feature

What is included:

  • Dense and Sparse
  • Multi vector fields
  • Rejection on BM25

What is not include:

  • Hybrid Search
  • EmbeddingList
  • Restful API

liliu-z avatar Nov 24 '25 15:11 liliu-z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liliu-z

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar Nov 24 '25 15:11 sre-ci-robot

[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm [master branch only]
  • /ci-rerun-e2e-default // for ci-v2/e2e-default [master branch only]

If you have any questions or requests, please contact @zhikunyao.

sre-ci-robot avatar Nov 24 '25 15:11 sre-ci-robot

Codecov Report

:x: Patch coverage is 55.85106% with 83 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 73.38%. Comparing base (0171511) to head (cb7cbf4). :warning: Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
internal/proxy/impl.go 43.93% 62 Missing and 12 partials :warning:
pkg/util/funcutil/placeholdergroup.go 0.00% 5 Missing :warning:
internal/distributed/proxy/httpserver/handler.go 50.00% 3 Missing :warning:
internal/proxy/task_search.go 50.00% 1 Missing :warning:

:x: Your patch check has failed because the patch coverage (55.85%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. :x: Your project check has failed because the head coverage (73.38%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #45820      +/-   ##
==========================================
- Coverage   76.05%   73.38%   -2.68%     
==========================================
  Files        1890     1365     -525     
  Lines      295336   212971   -82365     
==========================================
- Hits       224618   156284   -68334     
+ Misses      63289    49231   -14058     
- Partials     7429     7456      +27     
Components Coverage Δ
Client 78.18% <100.00%> (+0.01%) :arrow_up:
Core ∅ <ø> (∅)
Go 74.03% <49.39%> (-0.10%) :arrow_down:
Files with missing lines Coverage Δ
client/milvusclient/read_options.go 77.82% <100.00%> (+0.20%) :arrow_up:
...nternal/distributed/proxy/httpserver/handler_v1.go 88.55% <100.00%> (+0.02%) :arrow_up:
...nternal/distributed/proxy/httpserver/handler_v2.go 86.46% <100.00%> (+0.02%) :arrow_up:
tests/integration/util_query.go 69.44% <100.00%> (+0.49%) :arrow_up:
internal/proxy/task_search.go 69.36% <50.00%> (ø)
internal/distributed/proxy/httpserver/handler.go 97.44% <50.00%> (-0.45%) :arrow_down:
pkg/util/funcutil/placeholdergroup.go 48.08% <0.00%> (ø)
internal/proxy/impl.go 71.82% <43.93%> (-0.88%) :arrow_down:

... and 564 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 24 '25 17:11 codecov[bot]

overall it looks good to me.

might need some more monitoring metrics

xiaofan-luan avatar Nov 27 '25 22:11 xiaofan-luan

also need to unify vector -> bytevector function. (restful need to support byte vector search as well)

xiaofan-luan avatar Nov 27 '25 22:11 xiaofan-luan

overall it looks good to me.

might need some more monitoring metrics

Any suggestion on this. For latency, query & search will be recorded separately. I can help add a unified latency and label out the subquery and subsearch, also, do this for hybrid search in the next PR

liliu-z avatar Dec 01 '25 12:12 liliu-z

also need to unify vector -> bytevector function. (restful need to support byte vector search as well)

The restfulv2 has already support binary vector.

I guess you meant merging vector2Bytes and binaryVector2Bytes. These two are only for restfultV1, no one use but code is still there (will find a time to clean it out).

liliu-z avatar Dec 01 '25 12:12 liliu-z

/ci-rerun-ut-integration

liliu-z avatar Dec 02 '25 09:12 liliu-z

/ci-rerun-e2e-default

liliu-z avatar Dec 02 '25 09:12 liliu-z

/ci-rerun-ut-go

liliu-z avatar Dec 02 '25 09:12 liliu-z

/ci-rerun-ut-go

liliu-z avatar Dec 04 '25 02:12 liliu-z

@liliu-z go-sdk check failed, comment rerun go-sdk can trigger the job again.

mergify[bot] avatar Dec 05 '25 10:12 mergify[bot]

rerun go-sdk

liliu-z avatar Dec 05 '25 10:12 liliu-z

@liliu-z cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Dec 05 '25 12:12 mergify[bot]