milvus
milvus copied to clipboard
[Bug]: [Nightly] Milvus pod restart many times and panic for context deadline exceeded
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: aaaffc6
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Milvus pod restart many times and panic for context deadline exceeded.
[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-0 1/1 Running 1 (3h6m ago) 3h8m 10.105.1.68 ci-node10 <none> <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-1 1/1 Running 1 (3h6m ago) 3h8m 10.105.1.71 ci-node10 <none> <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-2 1/1 Running 0 3h8m 10.105.1.77 ci-node10 <none> <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-0 2/2 Running 3 (3h6m ago) 3h8m 10.105.1.66 ci-node10 <none> <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-1 2/2 Running 3 (3h6m ago) 3h8m 10.105.1.63 ci-node10 <none> <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-2 2/2 Running 1 (3h6m ago) 3h8m 10.105.1.80 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-kafka-exporter-6f984b5cbb-r5mld 1/1 Running 5 (3h6m ago) 3h8m 10.105.1.28 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datacoord-569765964-xzq97 1/1 Running 16 (52m ago) 3h8m 10.105.7.29 ci-node12 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datanode-f8ccd8c6d-2bc2r 1/1 Running 8 (51m ago) 3h8m 10.105.1.27 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datanode-f8ccd8c6d-vrqkl 1/1 Running 16 (52m ago) 3h8m 10.105.7.26 ci-node12 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexcoord-5984cbd7bf-2hxds 1/1 Running 0 3h8m 10.105.1.37 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexnode-6d6bf7d587-kqw4q 1/1 Running 7 (51m ago) 3h8m 10.105.1.31 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexnode-6d6bf7d587-xsnrt 1/1 Running 16 (52m ago) 3h8m 10.105.7.32 ci-node12 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-proxy-84d68d4ccb-mmlmj 1/1 Running 16 (52m ago) 3h8m 10.105.7.30 ci-node12 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-proxy-84d68d4ccb-qs2bs 1/1 Running 8 (51m ago) 3h8m 10.105.1.29 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querycoord-7ff8c85f58-x5648 1/1 Running 8 (51m ago) 3h8m 10.105.1.20 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querynode-68f986bc7-2ggg6 1/1 Running 7 (51m ago) 3h8m 10.105.1.19 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querynode-68f986bc7-ngw6z 1/1 Running 8 (51m ago) 3h8m 10.105.1.36 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-rootcoord-548cd959d9-z7nqx 1/1 Running 16 (52m ago) 3h8m 10.105.7.28 ci-node12 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-minio-6466b6c7c9-5jrjc 1/1 Running 0 3h8m 10.105.1.61 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-0 1/1 Running 0 3h8m 10.105.1.76 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-1 1/1 Running 0 3h8m 10.105.1.73 ci-node10 <none> <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-2 1/1 Running 0 3h8m 10.105.1.81 ci-node10 <none> <none>
[2024-08-15T21:07:55.509Z] mdk-836-n-etcd-0 restarts 1, last terminateed reason is "Error"
[2024-08-15T21:07:55.765Z] mdk-836-n-etcd-1 restarts 1, last terminateed reason is "Error"
[2024-08-15T21:07:55.765Z] mdk-836-n-kafka-0 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.020Z] mdk-836-n-kafka-1 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.275Z] mdk-836-n-kafka-2 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.275Z] mdk-836-n-kafka-exporter-6f984b5cbb-r5mld restarts 5, last terminateed reason is "Error"
[2024-08-15T21:07:56.531Z] mdk-836-n-milvus-datacoord-569765964-xzq97 restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:56.531Z] mdk-836-n-milvus-datanode-f8ccd8c6d-2bc2r restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:56.786Z] mdk-836-n-milvus-datanode-f8ccd8c6d-vrqkl restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:56.786Z] mdk-836-n-milvus-indexnode-6d6bf7d587-kqw4q restarts 7, last terminateed reason is "Error"
[2024-08-15T21:07:57.043Z] mdk-836-n-milvus-indexnode-6d6bf7d587-xsnrt restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:57.043Z] mdk-836-n-milvus-proxy-84d68d4ccb-mmlmj restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:57.299Z] mdk-836-n-milvus-proxy-84d68d4ccb-qs2bs restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.299Z] mdk-836-n-milvus-querycoord-7ff8c85f58-x5648 restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.555Z] mdk-836-n-milvus-querynode-68f986bc7-2ggg6 restarts 7, last terminateed reason is "Error"
[2024-08-15T21:07:57.555Z] mdk-836-n-milvus-querynode-68f986bc7-ngw6z restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.811Z] mdk-836-n-milvus-rootcoord-548cd959d9-z7nqx restarts 16, last terminateed reason is "Error"
Expected Behavior
pass
Steps To Reproduce
No response
Milvus Log
- link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/836/pipeline/207/
- log: artifacts-milvus-distributed-kafka-nightly-836-pymilvus-e2e-logs.tar.gz
- panic log:
2024-08-16T02:01:01.263565755+08:00 stderr F panic: context deadline exceeded
2024-08-16T02:01:01.263604535+08:00 stderr F
2024-08-16T02:01:01.263615515+08:00 stderr F goroutine 220 [running]:
2024-08-16T02:01:01.263624502+08:00 stderr F panic({0x55c09a0?, 0x895f5a0?})
2024-08-16T02:01:01.263655335+08:00 stderr F /usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc000a4bf70 sp=0xc000a4bec0 pc=0x1ec1bac
2024-08-16T02:01:01.263665885+08:00 stderr F github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
2024-08-16T02:01:01.263697376+08:00 stderr F /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:122 +0x108 fp=0xc000a4bfe0 sp=0xc000a4bf70 pc=0x4f32788
2024-08-16T02:01:01.263724189+08:00 stderr F runtime.goexit()
2024-08-16T02:01:01.263742299+08:00 stderr F /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000a4bfe8 sp=0xc000a4bfe0 pc=0x1efb401
2024-08-16T02:01:01.263758963+08:00 stderr F created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] in goroutine 1
2024-08-16T02:01:01.263768563+08:00 stderr F /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113 +0x138
2024-08-16T02:01:01.263776306+08:00 stderr F
2024-08-16T02:01:01.263783683+08:00 stderr F goroutine 1 [semacquire]:
2024-08-16T02:01:01.263790956+08:00 stderr F runtime.gopark(0x110?, 0x56c8b20?, 0x20?, 0x30?, 0x7f38f147de78?)
2024-08-16T02:01:01.263814317+08:00 stderr F /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0013df588 sp=0xc0013df568 pc=0x1ec5bce
Anything else?
No response