milvus [Bug]: Milvus deploy may failed due to Minio status not ready

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230309-130ab6da
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

components of Milvus keep restart

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

image tag: 2.2.0-20230309-130ab6da failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/2576/pipeline/ log: artifacts-querynode-pod-kill-2576-server-logs (1).tar.gz

Anything else?

No response

Mar 10 '23 03:03 zhuwenxing

/assign @LoveEachDay Please take a look

Mar 10 '23 03:03 zhuwenxing

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/2857/pipeline log: artifacts-proxy-pod-failure-2857-server-logs (1).tar.gz

API: SYSTEM()
Time: 21:47:55 UTC 03/20/2023
Error: Marking http://proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local:9000/minio/storage/export/v43 temporary offline; caused by Post "http://proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local:9000/minio/storage/export/v43/readall?disk-id=&file-path=format.json&volume=.minio.sys": lookup proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local on 10.101.0.10:53: no such host (*fmt.wrapError)
       6: internal/rest/client.go:151:rest.(*Client).Call()
       5: cmd/storage-rest-client.go:152:cmd.(*storageRESTClient).call()
       4: cmd/storage-rest-client.go:520:cmd.(*storageRESTClient).ReadAll()
       3: cmd/format-erasure.go:387:cmd.loadFormatErasure()
       2: cmd/format-erasure.go:326:cmd.loadFormatErasureAll.func1()
       1: internal/sync/errgroup/errgroup.go:123:errgroup.(*Group).Go.func1()
Waiting for all other servers to be online to format the disks (elapses 2m59s)


API: SYSTEM()
Time: 21:47:55 UTC 03/20/2023
Error: Marking http://proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local:9000/minio/storage/export/v43 temporary offline; caused by Post "http://proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local:9000/minio/storage/export/v43/readall?disk-id=&file-path=format.json&volume=.minio.sys": lookup proxy-pod-failure-2857-minio-3.proxy-pod-failure-2857-minio-svc.chaos-testing.svc.cluster.local on 10.101.0.10:53: no such host (*fmt.wrapError)
       6: internal/rest/client.go:151:rest.(*Client).Call()
       5: cmd/storage-rest-client.go:152:cmd.(*storageRESTClient).call()
       4: cmd/storage-rest-client.go:520:cmd.(*storageRESTClient).ReadAll()
       3: cmd/format-erasure.go:387:cmd.loadFormatErasure()
       2: cmd/format-erasure.go:326:cmd.loadFormatErasureAll.func1()
       1: internal/sync/errgroup/errgroup.go:123:errgroup.(*Group).Go.func1()
Waiting for all other servers to be online to format the disks (elapses 2m59s)

Mar 21 '23 03:03 zhuwenxing

At case 2857, minio started success after 47:55 based on logs, but the datacoord has already exhausted retry times at 46:43, that's why the datacoord start failed

Mar 22 '23 02:03 locustbaby

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Apr 21 '23 06:04 stale[bot]

milvus milvus copied to clipboard

[Bug]: Milvus deploy may failed due to Minio status not ready

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard