milvus
milvus copied to clipboard
[Bug]: Standalone pod restarted during first deployment test in reinstallation test
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20230426-f0ababb4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.803 | INFO | MainThread |utils:load_and_search:245 - ###########
[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.820 | INFO | MainThread |utils:load_and_search:195 - collection name: task_2_HNSW
[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.820 | INFO | MainThread |utils:load_and_search:196 - load collection
[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:4, cost: 0.27s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:5, cost: 0.81s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:6, cost: 2.43s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:7, cost: 7.29s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:39:14.455Z] [get_loading_progress] retry:8, cost: 21.87s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:39:36.309Z] [get_loading_progress] retry:9, cost: 60.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:40:43.901Z] [get_loading_progress] retry:10, cost: 60.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>
[2023-04-26T09:41:40.040Z] RPC error: [get_loading_progress], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:39:01.807313', 'RPC error': '2023-04-26 09:41:36.105243'}>
[2023-04-26T09:41:40.040Z] RPC error: [wait_for_loading_collection], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:36:06.919277', 'RPC error': '2023-04-26 09:41:36.105461'}>
[2023-04-26T09:41:40.040Z] RPC error: [load_collection], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:36:06.820426', 'RPC error': '2023-04-26 09:41:36.105544'}>
[2023-04-26T09:41:40.040Z] Traceback (most recent call last):
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-04-26T09:41:40.040Z] return func(self, *args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 793, in get_loading_progress
[2023-04-26T09:41:40.040Z] response = self._stub.GetLoadingProgress.future(request, timeout=timeout).result()
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 797, in result
[2023-04-26T09:41:40.040Z] raise self
[2023-04-26T09:41:40.040Z] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
[2023-04-26T09:41:40.040Z] status = StatusCode.UNAVAILABLE
[2023-04-26T09:41:40.040Z] details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused"
[2023-04-26T09:41:40.040Z] debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused {created_time:"2023-04-26T09:41:36.104710019+00:00", grpc_status:14}"
[2023-04-26T09:41:40.040Z] >
[2023-04-26T09:41:40.040Z]
[2023-04-26T09:41:40.040Z] The above exception was the direct cause of the following exception:
[2023-04-26T09:41:40.040Z]
[2023-04-26T09:41:40.040Z] Traceback (most recent call last):
[2023-04-26T09:41:40.040Z] File "scripts/action_before_reinstall.py", line 48, in <module>
[2023-04-26T09:41:40.040Z] task_2(data_size, host)
[2023-04-26T09:41:40.040Z] File "scripts/action_before_reinstall.py", line 34, in task_2
[2023-04-26T09:41:40.040Z] load_and_search(prefix)
[2023-04-26T09:41:40.040Z] File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 199, in load_and_search
[2023-04-26T09:41:40.040Z] c.load()
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 366, in load
[2023-04-26T09:41:40.040Z] conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-04-26T09:41:40.040Z] raise e
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-04-26T09:41:40.040Z] return func(*args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-04-26T09:41:40.040Z] ret = func(self, *args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-04-26T09:41:40.040Z] raise e
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-04-26T09:41:40.040Z] return func(self, *args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 710, in load_collection
[2023-04-26T09:41:40.040Z] self.wait_for_loading_collection(collection_name, timeout)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-04-26T09:41:40.040Z] raise e
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-04-26T09:41:40.040Z] return func(*args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-04-26T09:41:40.040Z] ret = func(self, *args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-04-26T09:41:40.040Z] raise e
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-04-26T09:41:40.040Z] return func(self, *args, **kwargs)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 728, in wait_for_loading_collection
[2023-04-26T09:41:40.040Z] progress = self.get_loading_progress(collection_name, timeout=timeout)
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-04-26T09:41:40.040Z] raise e
[2023-04-26T09:41:40.040Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-04-26T09:41:40.041Z] return func(*args, **kwargs)
[2023-04-26T09:41:40.041Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-04-26T09:41:40.041Z] ret = func(self, *args, **kwargs)
[2023-04-26T09:41:40.041Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 66, in handler
[2023-04-26T09:41:40.041Z] raise MilvusUnavailableException(message=f"server Unavailable: {timeout_msg}") from e
[2023-04-26T09:41:40.041Z] pymilvus.exceptions.MilvusUnavailableException: <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>
script returned exit code 1
[2023-04-26T09:41:42.078Z] + kubectl get pods -o wide
[2023-04-26T09:41:42.078Z] + grep rocksmq-standalone-reinstall-690
[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-0 1/1 Running 0 17m 10.102.6.189 devops-node10 <none> <none>
[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-1 1/1 Running 0 17m 10.102.9.75 devops-node13 <none> <none>
[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-2 1/1 Running 0 17m 10.102.10.236 devops-node20 <none> <none>
[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-milvus-standalone-7f76c8f4znt9 0/1 Running 1 (2m39s ago) 17m 10.102.9.74 devops-node13 <none> <none>
[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-minio-6b5866cb69-l295l 1/1 Running 0 17m 10.102.6.185 devops-node10 <none> <none>
Expected Behavior
all test cases passed
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/690/pipeline/
log: artifacts-rocksmq-standalone-reinstall-690-server-logs (1).tar.gz artifacts-rocksmq-standalone-reinstall-690-pytest-logs.tar.gz
Anything else?
No response
/assign
seems like session key in etcd has been delete by someone
didn't appear again, please verify this @zhuwenxing
/assign @zhuwenxing
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.