milvus
milvus copied to clipboard
[Bug]: List collection timeout after many pod kill or pod failure in master branch
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20220921-24ec3547
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.2.0.dev30
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:06 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default'], kwargs: {'host': '10.101.148.170', 'port': 19530} (api_request.py:56)
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:06 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:06 - DEBUG - ci_test]: (api_request) : [list_collections] args: [20, 'default'], kwargs: {} (api_request.py:56)
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:26 - ERROR - pymilvus.decorators]: RPC error: [list_collections], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 20s)>, <Time:{'RPC start': '2022-09-21 21:32:06.846047', 'RPC error': '2022-09-21 21:32:26.846583'}> (decorators.py:100)
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:26 - ERROR - ci_test]: Traceback (most recent call last):
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/decorators.py", line 50, in handler
[2022-09-21T21:32:50.693Z] return func(self, *args, **kwargs)
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/client/grpc_handler.py", line 263, in list_collections
[2022-09-21T21:32:50.693Z] response = rf.result()
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 744, in result
[2022-09-21T21:32:50.693Z] raise self
[2022-09-21T21:32:50.693Z] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
[2022-09-21T21:32:50.693Z] status = StatusCode.DEADLINE_EXCEEDED
[2022-09-21T21:32:50.693Z] details = "Deadline Exceeded"
[2022-09-21T21:32:50.693Z] debug_error_string = "{"created":"@1663795946.846259470","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
[2022-09-21T21:32:50.693Z] >
[2022-09-21T21:32:50.693Z]
[2022-09-21T21:32:50.693Z] During handling of the above exception, another exception occurred:
[2022-09-21T21:32:50.693Z]
[2022-09-21T21:32:50.693Z] Traceback (most recent call last):
[2022-09-21T21:32:50.693Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2022-09-21T21:32:50.693Z] res = func(*args, **_kwargs)
[2022-09-21T21:32:50.693Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2022-09-21T21:32:50.693Z] return func(*arg, **kwargs)
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/orm/utility.py", line 427, in list_collections
[2022-09-21T21:32:50.693Z] return _get_connection(using).list_collections(timeout=timeout)
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/decorators.py", line 101, in handler
[2022-09-21T21:32:50.693Z] raise e
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/decorators.py", line 97, in handler
[2022-09-21T21:32:50.693Z] return func(*args, **kwargs)
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/decorators.py", line 127, in handler
[2022-09-21T21:32:50.693Z] ret = func(self, *args, **kwargs)
[2022-09-21T21:32:50.693Z] File "/usr/local/lib/python3.7/dist-packages/pymilvus/decorators.py", line 67, in handler
[2022-09-21T21:32:50.693Z] raise MilvusException(Status.UNEXPECTED_ERROR, f"rpc deadline exceeded: {timeout_msg}")
[2022-09-21T21:32:50.693Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 20s)>
[2022-09-21T21:32:50.693Z] (api_request.py:39)
[2022-09-21T21:32:50.693Z] [2022-09-21 21:32:26 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 20s)> (api_request.py:40)
[2022-09-21T21:32:50.693Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2022-09-21T21:32:50.693Z] =========================== short test summary info ============================
[2022-09-21T21:32:50.693Z] FAILED testcases/test_get_collections.py::TestGetCollections::test_get_collections_by_prefix
[2022-09-21T21:32:50.693Z] ============================== 1 failed in 40.25s ==============================
Expected Behavior
all operation works well
Steps To Reproduce
No response
Milvus Log
chaos type: pod-failure image tag: master-20220921-24ec3547 target pod: minio failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1452/pipeline
log: artifacts-s3-pod-failure-1452-pytest-logs (1).tar.gz
artifacts-s3-pod-failure-1452-server-logs (1).tar.gz
Anything else?
chaos type: pod-failure image tag: master-20220921-24ec3547 target pod: etcd failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1451/pipeline
chaos type: pod-failure image tag: master-20220921-24ec3547 target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1449/pipeline
chaos type: pod-failure image tag: master-20220921-24ec3547 target pod: datanode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1443/pipeline
chaos type: pod-kill image tag: master-20220921-24ec3547 target pod: indexcoord failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1432/pipeline
chaos type: pod-kill image tag: master-20220921-24ec3547 target pod: datanode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/1431/pipeline
and so on
/assign @sunby /unassign
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.