milvus
milvus copied to clipboard
[Bug]: Range search returns empty results when nq is large
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20231007-80eb5434
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.1.post1.dev5
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Range search returns empty results when nq is large.
range_search_params = {"metric_type": "L2", "params": {"radius": 1000, "range_filter": 0}}
search_res, _ = collection_w.search(vectors, default_search_field,
range_search_params, default_limit, expression)
It returns empty result when nq=13 (total 500).
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
―――――――――――――――――――――――― TestCollectionRangeSearch.test_range_search_with_expression_large[128] ――――――――――――――――――――――――
self = <test_search.TestCollectionRangeSearch object at 0x17ef66b90>, dim = 128
@pytest.mark.tags(CaseLabel.L2)
def test_range_search_with_expression_large(self, dim):
"""
target: test range search with large expression
method: test range search with large expression
expected: searched successfully
"""
# 1. initialize with data
nb = 10000
collection_w, _, _, insert_ids = self.init_collection_general(prefix, True,
nb, dim=dim,
is_index=False)[0:4]
# 2. create index
index_param = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 100}}
collection_w.create_index("float_vector", index_param)
collection_w.load()
# 3. search with expression
expression = f"0 < {default_int64_field_name} < 5001"
log.info("test_search_with_expression: searching with expression: %s" % expression)
nums = 5000
vectors = [[random.random() for _ in range(dim)] for _ in range(nums)]
# calculate the distance to make sure in range(0, 1000)
search_params = {"metric_type": "L2"}
search_res, _ = collection_w.search(vectors, default_search_field,
search_params, 500, expression)
for i in range(nums):
if len(search_res[i]) < 10:
assert False
for j in range(len(search_res[i])):
if search_res[i][j].distance < 0 or search_res[i][j].distance >= 1000:
assert False
range_search_params = {"metric_type": "L2", "params": {"radius": 1000, "range_filter": 0}}
search_res, _ = collection_w.search(vectors, default_search_field,
range_search_params, default_limit, expression)
for i in range(nums):
log.info(i)
> assert len(search_res[i]) == default_limit
E assert 0 == 10
E + where 0 = len(<pymilvus.orm.search.Hits object at 0x2818193d0>)
test_search.py:7277: AssertionError
[2023-10-08 11:13:08 - INFO - ci_test]: test_search_with_expression: searching with expression: 0 < int64 < 5001 (test_search.py:7258)
[2023-10-08 11:13:09 - DEBUG - ci_test]: (api_request) : [Collection.search] args: [[[0.7152276710843498, 0.9907542736941191, 0.8934268805362852, 0.9715555546575151, 0.38780725382607195, 0.12973934223513672, 0.013730650389911614, 0.27500798395711135, 0.8207273830749972, 0.48717265080268235, 0.14628254904289062, 0.8254767267417195, 0.6415687234796168, 0.14963447922592976, 0.3275184......, kwargs: {} (api_request.py:62)
[2023-10-08 11:13:13 - DEBUG - ci_test]: (api_response) : ["['id: 1837, distance: 34.77191925048828, entity: {}', 'id: 4153, distance: 34.84160614013672, entity: {}', 'id: 2832, distance: 34.862953186035156, entity: {}', 'id: 3596, distance: 34.87803649902344, entity: {}', 'id: 3753, distance: 34.893272399902344, entity: {}', 'id: 1506, distance: 34.909423...... (api_request.py:37)
[2023-10-08 11:13:27 - DEBUG - ci_test]: (api_request) : [Collection.search] args: [[[0.7152276710843498, 0.9907542736941191, 0.8934268805362852, 0.9715555546575151, 0.38780725382607195, 0.12973934223513672, 0.013730650389911614, 0.27500798395711135, 0.8207273830749972, 0.48717265080268235, 0.14628254904289062, 0.8254767267417195, 0.6415687234796168, 0.14963447922592976, 0.3275184......, kwargs: {} (api_request.py:62)
[2023-10-08 11:13:28 - DEBUG - ci_test]: (api_response) : ["['id: 3875, distance: 34.54219055175781, entity: {}', 'id: 1307, distance: 34.55926513671875, entity: {}', 'id: 88, distance: 34.666324615478516, entity: {}', 'id: 1729, distance: 34.70890808105469, entity: {}', 'id: 40, distance: 34.720645904541016, entity: {}', 'id: 727, distance: 34.73711395263...... (api_request.py:37)
[2023-10-08 11:13:28 - INFO - ci_test]: 0 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 1 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 2 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 3 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 4 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 5 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 6 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 7 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 8 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 9 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 10 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 11 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 12 (test_search.py:7276)
[2023-10-08 11:13:28 - INFO - ci_test]: 13 (test_search.py:7276)
Anything else?
No response
/assign @jiaoew1991 /unassign
/assign @smellthemoon /unassign
The issue has been fixed with #27517.
Reproduced.
- milvus version: 84d05b9
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/531/pipeline/141/
- log: artifacts-milvus-standalone-nightly-531-pymilvus-e2e-logs.tar.gz
- failed time: [2023-10-17T19:37:01.160Z] [gw5] [ 90%] FAILED testcases/test_search.py::TestCollectionRangeSearch::test_range_search_with_expression_large[128]
Hi @NicoYuan1986, if you comment this line, this case can pass
expression = f"0 < {default_int64_field_name} < 5001"
for IVF_FLAT, current range search strategy is: for eatch nq, iterate all buckets from close to far, stop till find no vector in range for one bucket
the fail reason is, when iterate No.0 bucket, there are few vectors in range and all these vectors are marked as deleted, so the range search stops with empty result for this nq.
CYD - scan list search stop in bucket 53
CYD - scan list search stop in bucket 95
CYD - scan list search stop in bucket 0 <=== here
CYD - scan list search stop in bucket 96
CYD - scan list search stop in bucket 95
CYD - scan list search stop in bucket 96
CYD - scan list search stop in bucket 59
CYD - scan list search stop in bucket 94
CYD - scan list search stop in bucket 94
CYD - scan list search stop in bucket 83
CYD - scan list search stop in bucket 0
CYD - scan list search stop in bucket 95
@NicoYuan1986 @cydrain make nlist smaller ?or just skip this test. Any comments?
- I don't see any delete requests in the test, so I guess it was caused by compaction.
- I think we shall fix this issue, the search result shall not be impact by compaction or deletion @jiaoew1991 @liliu-z @xiaofan-luan any comments
expression = f"0 < {default_int64_field_name} < 5001"
Hi @yanliang567 ,
because this expression, half of the vectors are marked as "deleted"
expression = f"0 < {default_int64_field_name} < 5001"
Keep reproducing on 2.3 branch and master branch nightly. I think, from the user's perspective, this is a bug. Should we skip this case as a known issue or make a fix? @jiaoew1991 @liliu-z @xiaofan-luan
https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/2.3/30/pipeline/141/
There may be no result for the first nq.
Seems like this is a bug.
Let's keep it until the new range search API is designed and this problem should be fixed
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
lets keep it active
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.