milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][standalone]Milvus search with glove dataset, nprobe is 512 when search reports an error“ StatusCode.DEADLINE_EXCEEDED”

Open jingkl opened this issue 2 years ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20221216-1aa7a9a8
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.1dev
4
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

tastcase: test_recall_glove_ivf_flat_standalone


NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES
fouramf-cron-1671206400-1-6-1861-etcd-0                           1/1     Running     0          81m     10.104.1.35    4am-node10   <none>           <none>
fouramf-cron-1671206400-1-6-1861-milvus-standalone-77646fcs6gmz   1/1     Running     0          81m     10.104.5.242   4am-node12   <none>           <none>
fouramf-cron-1671206400-1-6-1861-minio-8685fffd86-pnztk           1/1     Running     0          81m     10.104.5.243   4am-node12   <none>           <none> (cli_client.py:115)

client log:

[2022-12-17 05:41:38,774 -  INFO - fouram]: [Base] Params of search: nq:10000, anns_field:float_vector, param:{'metric_type': 'IP', 'params': {'nprobe': 512}}, limit:10, expr:"None"
[2022-12-17 05:46:38,979 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)>, <Time:{'RPC start': '2022-12-17 05:41:38.774922', 'RPC error': '2022-12-17 05:46:38.978950'}> (decorators.py:108)
[2022-12-17 05:46:38,980 - ERROR - fouram]: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 469, in search
    return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 438, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 426, in _execute_search_requests
    response = self._stub.Search(request, timeout=timeout)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
    res = func(*args, **kwargs)
  File "/src/fouram/client/util/api_request.py", line 70, in api_request
    return func(*arg, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 610, in search
    res = conn.search(self._name, data, anns_field, param, limit, expr,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 58, in handler
    raise MilvusException(message=str(e)) from e
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)>
 (api_request.py:48)
[2022-12-17 05:46:38,980 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)> (api_request.py:49)
[2022-12-17 05:46:38,981 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)> (func_check.py:48)

Expected Behavior

No response

Steps To Reproduce

1. create a collection 
        2. insert training dataset
        3. flush collection
        4. clean index and build new index
        5. load collection
        6. search with different parameters
        7. clean all collections or not

Milvus Log

No response

Anything else?

scene_recall required params: {'dataset_params': {'dim': 200, 'dataset_name': 'glove-200-angular', 'ni_per': 10000}, 'collection_params': {'other_fields': []}, 'load_params': {'replica_number': 1}, 'search_params': {'top_k': [10], 'nq': [10000], 'search_param': {'nprobe': [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]}}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 1024}}} (params_check.py:31)

jingkl avatar Dec 19 '22 03:12 jingkl

/assign @jiaoew1991 /unassign

yanliang567 avatar Dec 19 '22 05:12 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jan 23 '23 05:01 stale[bot]

keep it active

yanliang567 avatar Jan 23 '23 08:01 yanliang567

this might be a knowhere performance issue?

xiaofan-luan avatar Jan 24 '23 02:01 xiaofan-luan

i remember knowhere has no issue with nprobe=512

yanliang567 avatar Jan 27 '23 02:01 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Mar 17 '23 04:03 stale[bot]

image: 2.2.0-20230621-34cd8baa

server:


I0624 21:13:21.760327    9179 request.go:665] Waited for 1.139327363s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/networking.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-recall-18800-1-59-5954-etcd-0                             1/1     Running     0               116s    10.104.16.91    4am-node21   <none>           <none>
fouramf-recall-18800-1-59-5954-milvus-standalone-5d6bddb47vjmsg   1/1     Running     0               116s    10.104.12.82    4am-node17   <none>           <none>
fouramf-recall-18800-1-59-5954-minio-566b7d967c-6t2t8             1/1     Running     0               116s    10.104.16.90    4am-node21   <none>           <none> (base.py:192)
[2023-06-24 22:37:59,739 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|fouramf-recall-18800-1-59-5954'  (util_cmd.py:14)
[2023-06-24 22:38:10,920 -  INFO - fouram]: [CliClient] pod details of release(fouramf-recall-18800-1-59-5954): 
 I0624 22:38:00.991443    9792 request.go:665] Waited for 1.166207429s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/policy/v1?timeout=32s
NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-recall-18800-1-59-5954-etcd-0                             1/1     Running     0               86m     10.104.16.91    4am-node21   <none>           <none>
fouramf-recall-18800-1-59-5954-milvus-standalone-5d6bddb47vjmsg   1/1     Running     0               86m     10.104.12.82    4am-node17   <none>           <none>
fouramf-recall-18800-1-59-5954-minio-566b7d967c-6t2t8             1/1     Running     0               86m     10.104.16.90    4am-node21   <none>           <none> (cli_client.py:131)

client log:

[2023-06-24 22:31:59,654 -  INFO - fouram]: [Base] Params of search: nq:10000, anns_field:float_vector, param:{'metric_type': 'IP', 'params': {'nprobe': 512}}, limit:10, expr:"None", kwargs:{} (base.py:482)
[2023-06-24 22:36:59,882 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)>, <Time:{'RPC start': '2023-06-24 22:31:59.654771', 'RPC error': '2023-06-24 22:36:59.882901'}> (decorators.py:108)
[2023-06-24 22:36:59,885 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)> (api_request.py:49)
[2023-06-24 22:36:59,885 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)> (func_check.py:52)

jingkl avatar Jun 25 '23 11:06 jingkl

/assign @yah01 /unassign

Just look at the recently reported error cases.

jiaoew1991 avatar Jun 25 '23 11:06 jiaoew1991

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 02 '23 05:08 stale[bot]

any updates? this blocks us nightly-perf tests

yanliang567 avatar Aug 02 '23 07:08 yanliang567

@yanliang567 @jingkl could you rerun this? I was focusing on the other critical problems and just be free to working on this

yah01 avatar Aug 02 '23 08:08 yah01

and the similar problem #23331 has been fixed

yah01 avatar Aug 02 '23 08:08 yah01

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 03 '23 18:09 stale[bot]