milvus
milvus copied to clipboard
[Bug]: [hybrid_search] Unexpected error when "reqs" is an empty list for the interface "hybrid_search"
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.4 latest
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka): all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.1rc10
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Unexpected error when "reqs" is an empty list for the interface "hybrid_search"
(code=1, message=Unexpected error, message=<list index out of range>)> (api_request.py:46)
Expected Behavior
Report milvus error rather than unexpected error
when input the (maximum+1) value for nq, it report error "(code=65535, message=nq [16385] is invalid, nq (number of search vector per search request) should be in range [1, 16384], but got 16385"
It is better to keep the same error when nq = 0 (smaller than the maximum value 1)
Steps To Reproduce
@pytest.mark.tags(CaseLabel.L1)
@pytest.mark.parametrize("nq", [0, 16385])
def test_hybrid_search_normal_over_max_nq(self, nq):
"""
target: test hybrid search normal case
method: create connection, collection, insert and search
expected: hybrid search successfully with limit(topK)
"""
# 1. initialize collection with data
collection_w = self.init_collection_general(prefix, True)[0]
# 2. extract vector field name
vector_name_list = cf.extract_vector_field_name_list(collection_w)
vector_name_list.append(ct.default_float_vec_field_name)
# 3. prepare search params
req_list = []
weights = [1]
vectors = cf.gen_vectors_based_on_vector_type(nq, default_dim, "FLOAT_VECTOR")
# 4. get hybrid search req list
for i in range(len(vector_name_list)):
search_param = {
"data": vectors,
"anns_field": vector_name_list[i],
"param": {"metric_type": "COSINE"},
"limit": default_limit,
"expr": "int64 > 0"}
req = AnnSearchRequest(**search_param)
req_list.append(req)
# 5. hybrid search
err_msg = "nq (number of search vector per search request) should be in range [1, 16384]"
collection_w.hybrid_search(req_list, WeightedRanker(*weights), default_limit,
check_task=CheckTasks.err_res,
check_items={"err_code": 65535,
"err_msg": err_msg})
Milvus Log
No response
Anything else?
No response
/assign @czs007
/unassign
File "/home/czs/pymilvus/pymilvus/decorators.py", line 143, in handler return func(*args, **kwargs) File "/home/czs/pymilvus/pymilvus/decorators.py", line 182, in handler return func(self, *args, **kwargs) File "/home/czs/pymilvus/pymilvus/decorators.py", line 124, in handler raise e from e File "/home/czs/pymilvus/pymilvus/decorators.py", line 87, in handler return func(*args, **kwargs) File "/home/czs/pymilvus/pymilvus/client/grpc_handler.py", line 820, in hybrid_search search_request = Prepare.search_requests_with_expr( File "/home/czs/pymilvus/pymilvus/client/prepare.py", line 612, in search_requests_with_expr elif isinstance(data[0], bytes): IndexError: list index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "hybrid_search3.py", line 44, in )>
it is client raised excpetion
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
I got the same error, how to resolve?
@czs007 is this still a issue?i'm assuming this is becasue some parameter wrong ?
search_param_dense = { "data": dense_embeddings, "anns_field": "dense_vector", "param": { "metric_type": "COSINE", "params": {"nprobe": 10} }, "limit": 100 # TODO hybrid search bug https://github.com/milvus-io/milvus/issues/32288 } search_param_sparse = { "data": sparse_embeddings, "anns_field": "sparse_vector", "param": { "metric_type": "IP", "params": {"nprobe": 10} }, "limit": 100 # TODO } I used to set the limit to col.num_entities, it was 24007. and it says the range should be within [1, 16835].
@xxxfzxxx please try the latest 2.4.4 pymilvus
For a search, the limit indeed cannot exceed 16484.
Is there a reason?
@xxxfzxxx The conventional search lacks an iterative interface. We incorporate a limit constraint to avoid returning an excessive amount of data at once, thus preventing OOM (Out of Memory) errors.
Has the issue mentioned in the error message been resolved after upgrading pymilvus?
"Has the issue mentioned in the error message been resolved after upgrading pymilvus?"
NO.
I don't understand what is the difference between limit: 10 and limit: 1000. Because you will eventually calculate the similarity scores across all entities and select top 10, or top 1000. Why the limit matters here?
I am using a hybrid search, I would like to search across all entities and find the top k by a WeightedRanker(0.4, 0.6). But, if I limit 10, sometimes the retrieved "sparse" entities are not overlapped with the "dense" entities. Then how to address this case?
- the large the topk ,the performance will be worth.
- milvus split data into small segemnts, each segemnts has roughly 100k-1m data. if topk is get closer to segment size, index won't work.
you can user range search to get more vector
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.