milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Standalone pod restarted during first deployment test in reinstallation test

Open zhuwenxing opened this issue 1 year ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: master-20230426-f0ababb4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): rocksmq   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.803 | INFO     | MainThread |utils:load_and_search:245 - ###########

[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.820 | INFO     | MainThread |utils:load_and_search:195 - collection name: task_2_HNSW

[2023-04-26T09:36:07.109Z] 2023-04-26 09:36:06.820 | INFO     | MainThread |utils:load_and_search:196 - load collection

[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:4, cost: 0.27s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:5, cost: 0.81s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:6, cost: 2.43s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:39:13.533Z] [get_loading_progress] retry:7, cost: 7.29s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:39:14.455Z] [get_loading_progress] retry:8, cost: 21.87s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:39:36.309Z] [get_loading_progress] retry:9, cost: 60.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:40:43.901Z] [get_loading_progress] retry:10, cost: 60.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused>

[2023-04-26T09:41:40.040Z] RPC error: [get_loading_progress], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:39:01.807313', 'RPC error': '2023-04-26 09:41:36.105243'}>

[2023-04-26T09:41:40.040Z] RPC error: [wait_for_loading_collection], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:36:06.919277', 'RPC error': '2023-04-26 09:41:36.105461'}>

[2023-04-26T09:41:40.040Z] RPC error: [load_collection], <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>, <Time:{'RPC start': '2023-04-26 09:36:06.820426', 'RPC error': '2023-04-26 09:41:36.105544'}>

[2023-04-26T09:41:40.040Z] Traceback (most recent call last):

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-04-26T09:41:40.040Z]     return func(self, *args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 793, in get_loading_progress

[2023-04-26T09:41:40.040Z]     response = self._stub.GetLoadingProgress.future(request, timeout=timeout).result()

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 797, in result

[2023-04-26T09:41:40.040Z]     raise self

[2023-04-26T09:41:40.040Z] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:

[2023-04-26T09:41:40.040Z] 	status = StatusCode.UNAVAILABLE

[2023-04-26T09:41:40.040Z] 	details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused"

[2023-04-26T09:41:40.040Z] 	debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:10.101.175.91:19530: Failed to connect to remote host: Connection refused {created_time:"2023-04-26T09:41:36.104710019+00:00", grpc_status:14}"

[2023-04-26T09:41:40.040Z] >

[2023-04-26T09:41:40.040Z] 

[2023-04-26T09:41:40.040Z] The above exception was the direct cause of the following exception:

[2023-04-26T09:41:40.040Z] 

[2023-04-26T09:41:40.040Z] Traceback (most recent call last):

[2023-04-26T09:41:40.040Z]   File "scripts/action_before_reinstall.py", line 48, in <module>

[2023-04-26T09:41:40.040Z]     task_2(data_size, host)

[2023-04-26T09:41:40.040Z]   File "scripts/action_before_reinstall.py", line 34, in task_2

[2023-04-26T09:41:40.040Z]     load_and_search(prefix)

[2023-04-26T09:41:40.040Z]   File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 199, in load_and_search

[2023-04-26T09:41:40.040Z]     c.load()

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 366, in load

[2023-04-26T09:41:40.040Z]     conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-04-26T09:41:40.040Z]     raise e

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-04-26T09:41:40.040Z]     return func(*args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-04-26T09:41:40.040Z]     ret = func(self, *args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-04-26T09:41:40.040Z]     raise e

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-04-26T09:41:40.040Z]     return func(self, *args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 710, in load_collection

[2023-04-26T09:41:40.040Z]     self.wait_for_loading_collection(collection_name, timeout)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-04-26T09:41:40.040Z]     raise e

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-04-26T09:41:40.040Z]     return func(*args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-04-26T09:41:40.040Z]     ret = func(self, *args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2023-04-26T09:41:40.040Z]     raise e

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-04-26T09:41:40.040Z]     return func(self, *args, **kwargs)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 728, in wait_for_loading_collection

[2023-04-26T09:41:40.040Z]     progress = self.get_loading_progress(collection_name, timeout=timeout)

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-04-26T09:41:40.040Z]     raise e

[2023-04-26T09:41:40.040Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-04-26T09:41:40.041Z]     return func(*args, **kwargs)

[2023-04-26T09:41:40.041Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-04-26T09:41:40.041Z]     ret = func(self, *args, **kwargs)

[2023-04-26T09:41:40.041Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 66, in handler

[2023-04-26T09:41:40.041Z]     raise MilvusUnavailableException(message=f"server Unavailable: {timeout_msg}") from e

[2023-04-26T09:41:40.041Z] pymilvus.exceptions.MilvusUnavailableException: <MilvusUnavailableException: (code=1, message=server Unavailable: Retry run out of 10 retry times)>

script returned exit code 1
[2023-04-26T09:41:42.078Z] + kubectl get pods -o wide

[2023-04-26T09:41:42.078Z] + grep rocksmq-standalone-reinstall-690

[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-0                           1/1     Running            0                  17m     10.102.6.189    devops-node10   <none>           <none>

[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-1                           1/1     Running            0                  17m     10.102.9.75     devops-node13   <none>           <none>

[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-etcd-2                           1/1     Running            0                  17m     10.102.10.236   devops-node20   <none>           <none>

[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-milvus-standalone-7f76c8f4znt9   0/1     Running            1 (2m39s ago)      17m     10.102.9.74     devops-node13   <none>           <none>

[2023-04-26T09:41:42.333Z] rocksmq-standalone-reinstall-690-minio-6b5866cb69-l295l           1/1     Running            0                  17m     10.102.6.185    devops-node10   <none>           <none>

Expected Behavior

all test cases passed

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/690/pipeline/

log: artifacts-rocksmq-standalone-reinstall-690-server-logs (1).tar.gz artifacts-rocksmq-standalone-reinstall-690-pytest-logs.tar.gz

Anything else?

No response

zhuwenxing avatar Apr 27 '23 03:04 zhuwenxing

/assign

weiliu1031 avatar Apr 27 '23 09:04 weiliu1031

image seems like session key in etcd has been delete by someone

weiliu1031 avatar Apr 27 '23 09:04 weiliu1031

didn't appear again, please verify this @zhuwenxing

weiliu1031 avatar May 19 '23 04:05 weiliu1031

/assign @zhuwenxing

weiliu1031 avatar May 19 '23 06:05 weiliu1031

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 19 '23 05:06 stale[bot]