milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [hybrid_search] Unexpected error when "reqs" is an empty list for the interface "hybrid_search"

Open binbinlv opened this issue 10 months ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.4 latest
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.1rc10
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Unexpected error when "reqs" is an empty list for the interface "hybrid_search"

(code=1, message=Unexpected error, message=<list index out of range>)> (api_request.py:46)

Expected Behavior

Report milvus error rather than unexpected error

when input the (maximum+1) value for nq, it report error "(code=65535, message=nq [16385] is invalid, nq (number of search vector per search request) should be in range [1, 16384], but got 16385"

It is better to keep the same error when nq = 0 (smaller than the maximum value 1)

Steps To Reproduce

@pytest.mark.tags(CaseLabel.L1)
    @pytest.mark.parametrize("nq", [0, 16385])
    def test_hybrid_search_normal_over_max_nq(self, nq):
        """
        target: test hybrid search normal case
        method: create connection, collection, insert and search
        expected: hybrid search successfully with limit(topK)
        """
        # 1. initialize collection with data
        collection_w = self.init_collection_general(prefix, True)[0]
        # 2. extract vector field name
        vector_name_list = cf.extract_vector_field_name_list(collection_w)
        vector_name_list.append(ct.default_float_vec_field_name)
        # 3. prepare search params
        req_list = []
        weights = [1]
        vectors = cf.gen_vectors_based_on_vector_type(nq, default_dim, "FLOAT_VECTOR")
        # 4. get hybrid search req list
        for i in range(len(vector_name_list)):
            search_param = {
                "data": vectors,
                "anns_field": vector_name_list[i],
                "param": {"metric_type": "COSINE"},
                "limit": default_limit,
                "expr": "int64 > 0"}
            req = AnnSearchRequest(**search_param)
            req_list.append(req)
        # 5. hybrid search
        err_msg = "nq (number of search vector per search request) should be in range [1, 16384]"
        collection_w.hybrid_search(req_list, WeightedRanker(*weights), default_limit,
                                   check_task=CheckTasks.err_res,
                                   check_items={"err_code": 65535,
                                                "err_msg": err_msg})

Milvus Log

No response

Anything else?

No response

binbinlv avatar Apr 15 '24 16:04 binbinlv

/assign @czs007

xiaofan-luan avatar Apr 15 '24 21:04 xiaofan-luan

/unassign

yanliang567 avatar Apr 16 '24 01:04 yanliang567

File "/home/czs/pymilvus/pymilvus/decorators.py", line 143, in handler return func(*args, **kwargs) File "/home/czs/pymilvus/pymilvus/decorators.py", line 182, in handler return func(self, *args, **kwargs) File "/home/czs/pymilvus/pymilvus/decorators.py", line 124, in handler raise e from e File "/home/czs/pymilvus/pymilvus/decorators.py", line 87, in handler return func(*args, **kwargs) File "/home/czs/pymilvus/pymilvus/client/grpc_handler.py", line 820, in hybrid_search search_request = Prepare.search_requests_with_expr( File "/home/czs/pymilvus/pymilvus/client/prepare.py", line 612, in search_requests_with_expr elif isinstance(data[0], bytes): IndexError: list index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "hybrid_search3.py", line 44, in hybrid_res = hello_milvus.hybrid_search(req_list, WeightedRanker(*weights), default_limit, output_fields=["random"]) File "/home/czs/pymilvus/pymilvus/orm/collection.py", line 936, in hybrid_search resp = conn.hybrid_search( File "/home/czs/pymilvus/pymilvus/decorators.py", line 165, in handler raise MilvusException(message=f"Unexpected error, message=<{e!s}>") from e pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Unexpected error, message=)>

it is client raised excpetion

czs007 avatar Apr 16 '24 07:04 czs007

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 12 '24 07:06 stale[bot]

I got the same error, how to resolve?

xxxfzxxx avatar Jun 23 '24 08:06 xxxfzxxx

@czs007 is this still a issue?i'm assuming this is becasue some parameter wrong ?

xiaofan-luan avatar Jun 23 '24 12:06 xiaofan-luan

search_param_dense = { "data": dense_embeddings, "anns_field": "dense_vector", "param": { "metric_type": "COSINE", "params": {"nprobe": 10} }, "limit": 100 # TODO hybrid search bug https://github.com/milvus-io/milvus/issues/32288 } search_param_sparse = { "data": sparse_embeddings, "anns_field": "sparse_vector", "param": { "metric_type": "IP", "params": {"nprobe": 10} }, "limit": 100 # TODO } I used to set the limit to col.num_entities, it was 24007. and it says the range should be within [1, 16835].

xxxfzxxx avatar Jun 24 '24 03:06 xxxfzxxx

@xxxfzxxx please try the latest 2.4.4 pymilvus

For a search, the limit indeed cannot exceed 16484.

czs007 avatar Jun 24 '24 07:06 czs007

Is there a reason?

xxxfzxxx avatar Jun 25 '24 03:06 xxxfzxxx

@xxxfzxxx The conventional search lacks an iterative interface. We incorporate a limit constraint to avoid returning an excessive amount of data at once, thus preventing OOM (Out of Memory) errors.

Has the issue mentioned in the error message been resolved after upgrading pymilvus?

czs007 avatar Jun 25 '24 07:06 czs007

"Has the issue mentioned in the error message been resolved after upgrading pymilvus?"

NO.

xxxfzxxx avatar Jun 25 '24 07:06 xxxfzxxx

I don't understand what is the difference between limit: 10 and limit: 1000. Because you will eventually calculate the similarity scores across all entities and select top 10, or top 1000. Why the limit matters here?

I am using a hybrid search, I would like to search across all entities and find the top k by a WeightedRanker(0.4, 0.6). But, if I limit 10, sometimes the retrieved "sparse" entities are not overlapped with the "dense" entities. Then how to address this case?

xxxfzxxx avatar Jun 25 '24 07:06 xxxfzxxx

  1. the large the topk ,the performance will be worth.
  2. milvus split data into small segemnts, each segemnts has roughly 100k-1m data. if topk is get closer to segment size, index won't work.

you can user range search to get more vector

xiaofan-luan avatar Jun 25 '24 13:06 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 27 '24 08:07 stale[bot]