milvus
milvus copied to clipboard
[Bug]: [benchmark][standalone]Milvus search with glove dataset, nprobe is 512 when search reports an error“ StatusCode.DEADLINE_EXCEEDED”
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.2.0-20221216-1aa7a9a8
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.1dev
4
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
tastcase: test_recall_glove_ivf_flat_standalone
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-cron-1671206400-1-6-1861-etcd-0 1/1 Running 0 81m 10.104.1.35 4am-node10 <none> <none>
fouramf-cron-1671206400-1-6-1861-milvus-standalone-77646fcs6gmz 1/1 Running 0 81m 10.104.5.242 4am-node12 <none> <none>
fouramf-cron-1671206400-1-6-1861-minio-8685fffd86-pnztk 1/1 Running 0 81m 10.104.5.243 4am-node12 <none> <none> (cli_client.py:115)
client log:
[2022-12-17 05:41:38,774 - INFO - fouram]: [Base] Params of search: nq:10000, anns_field:float_vector, param:{'metric_type': 'IP', 'params': {'nprobe': 512}}, limit:10, expr:"None"
[2022-12-17 05:46:38,979 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)>, <Time:{'RPC start': '2022-12-17 05:41:38.774922', 'RPC error': '2022-12-17 05:46:38.978950'}> (decorators.py:108)
[2022-12-17 05:46:38,980 - ERROR - fouram]: Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 469, in search
return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 438, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 426, in _execute_search_requests
response = self._stub.Search(request, timeout=timeout)
File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
res = func(*args, **kwargs)
File "/src/fouram/client/util/api_request.py", line 70, in api_request
return func(*arg, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 610, in search
res = conn.search(self._name, data, anns_field, param, limit, expr,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 58, in handler
raise MilvusException(message=str(e)) from e
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)>
(api_request.py:48)
[2022-12-17 05:46:38,980 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)> (api_request.py:49)
[2022-12-17 05:46:38,981 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671255998.978327993","description":"Error received from peer ipv4:10.255.98.37:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>)> (func_check.py:48)
Expected Behavior
No response
Steps To Reproduce
1. create a collection
2. insert training dataset
3. flush collection
4. clean index and build new index
5. load collection
6. search with different parameters
7. clean all collections or not
Milvus Log
No response
Anything else?
scene_recall required params: {'dataset_params': {'dim': 200, 'dataset_name': 'glove-200-angular', 'ni_per': 10000}, 'collection_params': {'other_fields': []}, 'load_params': {'replica_number': 1}, 'search_params': {'top_k': [10], 'nq': [10000], 'search_param': {'nprobe': [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]}}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 1024}}} (params_check.py:31)
/assign @jiaoew1991 /unassign
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
keep it active
this might be a knowhere performance issue?
i remember knowhere has no issue with nprobe=512
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
image: 2.2.0-20230621-34cd8baa
server:
I0624 21:13:21.760327 9179 request.go:665] Waited for 1.139327363s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/networking.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-recall-18800-1-59-5954-etcd-0 1/1 Running 0 116s 10.104.16.91 4am-node21 <none> <none>
fouramf-recall-18800-1-59-5954-milvus-standalone-5d6bddb47vjmsg 1/1 Running 0 116s 10.104.12.82 4am-node17 <none> <none>
fouramf-recall-18800-1-59-5954-minio-566b7d967c-6t2t8 1/1 Running 0 116s 10.104.16.90 4am-node21 <none> <none> (base.py:192)
[2023-06-24 22:37:59,739 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'STATUS|fouramf-recall-18800-1-59-5954' (util_cmd.py:14)
[2023-06-24 22:38:10,920 - INFO - fouram]: [CliClient] pod details of release(fouramf-recall-18800-1-59-5954):
I0624 22:38:00.991443 9792 request.go:665] Waited for 1.166207429s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/policy/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-recall-18800-1-59-5954-etcd-0 1/1 Running 0 86m 10.104.16.91 4am-node21 <none> <none>
fouramf-recall-18800-1-59-5954-milvus-standalone-5d6bddb47vjmsg 1/1 Running 0 86m 10.104.12.82 4am-node17 <none> <none>
fouramf-recall-18800-1-59-5954-minio-566b7d967c-6t2t8 1/1 Running 0 86m 10.104.16.90 4am-node21 <none> <none> (cli_client.py:131)
client log:
[2023-06-24 22:31:59,654 - INFO - fouram]: [Base] Params of search: nq:10000, anns_field:float_vector, param:{'metric_type': 'IP', 'params': {'nprobe': 512}}, limit:10, expr:"None", kwargs:{} (base.py:482)
[2023-06-24 22:36:59,882 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)>, <Time:{'RPC start': '2023-06-24 22:31:59.654771', 'RPC error': '2023-06-24 22:36:59.882901'}> (decorators.py:108)
[2023-06-24 22:36:59,885 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)> (api_request.py:49)
[2023-06-24 22:36:59,885 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer ipv4:10.255.226.110:19530 {created_time:"2023-06-24T22:36:59.882486437+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>)> (func_check.py:52)
/assign @yah01 /unassign
Just look at the recently reported error cases.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
any updates? this blocks us nightly-perf tests
@yanliang567 @jingkl could you rerun this? I was focusing on the other critical problems and just be free to working on this
and the similar problem #23331 has been fixed
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.