milvus
milvus copied to clipboard
[Bug]: Load failed after rootcoord pod kill chaos test
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.2.0-20230426-8745ee25
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:17 - INFO - ci_test]: [test][2023-04-26T23:23:16Z] [0.46655728s] Hello_Milvus insert -> (insert count: 3000, delete count: 0, upsert count: 0, timestamp: 441070753457373187, success count: 3000, err count: 0) (wrapper.py:30)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:17 - DEBUG - ci_test]: (api_request) : [Collection.flush] args: [], kwargs: {'timeout': 120} (api_request.py:56)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - INFO - ci_test]: [test][2023-04-26T23:23:17Z] [3.01947605s] Hello_Milvus flush -> None (wrapper.py:30)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - INFO - ci_test]: assert entities: 12000 (test_data_persistence.py:84)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - DEBUG - ci_test]: (api_request) : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - pymilvus.decorators]: RPC error: [load_collection], <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])>, <Time:{'RPC start': '2023-04-26 23:23:20.233388', 'RPC error': '2023-04-26 23:23:25.243119'}> (decorators.py:108)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - ci_test]: Traceback (most recent call last):
[2023-04-26T23:23:25.869Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2023-04-26T23:23:25.869Z] res = func(*args, **_kwargs)
[2023-04-26T23:23:25.869Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2023-04-26T23:23:25.869Z] return func(*arg, **kwargs)
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 366, in load
[2023-04-26T23:23:25.869Z] conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-04-26T23:23:25.869Z] raise e
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-04-26T23:23:25.869Z] return func(*args, **kwargs)
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-04-26T23:23:25.869Z] ret = func(self, *args, **kwargs)
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-04-26T23:23:25.869Z] raise e
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-04-26T23:23:25.869Z] return func(self, *args, **kwargs)
[2023-04-26T23:23:25.869Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 707, in load_collection
[2023-04-26T23:23:25.869Z] raise MilvusException(response.error_code, response.reason)
[2023-04-26T23:23:25.869Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])>
[2023-04-26T23:23:25.869Z] (api_request.py:39)
[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])> (api_request.py:40)
[2023-04-26T23:23:25.869Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-04-26T23:23:25.869Z] =========================== short test summary info ============================
[2023-04-26T23:23:25.869Z] FAILED testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default - AssertionError
[2023-04-26T23:23:25.869Z] ============================== 1 failed in 33.45s ==============================
Expected Behavior
all test cases pass
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/3772/pipeline log: artifacts-rootcoord-pod-kill-3772-server-logs.tar.gz artifacts-rootcoord-pod-kill-3772-pytest-logs.tar.gz
Anything else?
It was not reproduced 2.2.0-20230425-8b34f672, seems a new issue in yesterday's image
/assign @jiaoew1991 /unassign
/assign @smellthemoon /unassign
Not reproduced after three times retry