milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Load failed after rootcoord pod kill chaos test

Open zhuwenxing opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230426-8745ee25
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:17 - INFO - ci_test]: [test][2023-04-26T23:23:16Z] [0.46655728s] Hello_Milvus insert -> (insert count: 3000, delete count: 0, upsert count: 0, timestamp: 441070753457373187, success count: 3000, err count: 0) (wrapper.py:30)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:17 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 120} (api_request.py:56)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - INFO - ci_test]: [test][2023-04-26T23:23:17Z] [3.01947605s] Hello_Milvus flush -> None (wrapper.py:30)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - INFO - ci_test]: assert entities: 12000 (test_data_persistence.py:84)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:20 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - pymilvus.decorators]: RPC error: [load_collection], <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])>, <Time:{'RPC start': '2023-04-26 23:23:20.233388', 'RPC error': '2023-04-26 23:23:25.243119'}> (decorators.py:108)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - ci_test]: Traceback (most recent call last):

[2023-04-26T23:23:25.869Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2023-04-26T23:23:25.869Z]     res = func(*args, **_kwargs)

[2023-04-26T23:23:25.869Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2023-04-26T23:23:25.869Z]     return func(*arg, **kwargs)

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 366, in load

[2023-04-26T23:23:25.869Z]     conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-04-26T23:23:25.869Z]     raise e

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-04-26T23:23:25.869Z]     return func(*args, **kwargs)

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-04-26T23:23:25.869Z]     ret = func(self, *args, **kwargs)

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-04-26T23:23:25.869Z]     raise e

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-04-26T23:23:25.869Z]     return func(self, *args, **kwargs)

[2023-04-26T23:23:25.869Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 707, in load_collection

[2023-04-26T23:23:25.869Z]     raise MilvusException(response.error_code, response.reason)

[2023-04-26T23:23:25.869Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])>

[2023-04-26T23:23:25.869Z]  (api_request.py:39)

[2023-04-26T23:23:25.869Z] [2023-04-26 23:23:25 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=failed to load collection, err=failed to get partitions from RootCoord[context deadline exceeded])> (api_request.py:40)

[2023-04-26T23:23:25.869Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2023-04-26T23:23:25.869Z] =========================== short test summary info ============================

[2023-04-26T23:23:25.869Z] FAILED testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default - AssertionError

[2023-04-26T23:23:25.869Z] ============================== 1 failed in 33.45s ==============================

Expected Behavior

all test cases pass

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/3772/pipeline log: artifacts-rootcoord-pod-kill-3772-server-logs.tar.gz artifacts-rootcoord-pod-kill-3772-pytest-logs.tar.gz

Anything else?

It was not reproduced 2.2.0-20230425-8b34f672, seems a new issue in yesterday's image

zhuwenxing avatar Apr 27 '23 02:04 zhuwenxing

/assign @jiaoew1991 /unassign

yanliang567 avatar Apr 27 '23 10:04 yanliang567

/assign @smellthemoon /unassign

jiaoew1991 avatar May 04 '23 01:05 jiaoew1991

Not reproduced after three times retry

zhuwenxing avatar May 05 '23 12:05 zhuwenxing