milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [Nightly] Milvus pod restart many times and panic for context deadline exceeded

Open NicoYuan1986 opened this issue 6 months ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: aaaffc6
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Milvus pod restart many times and panic for context deadline exceeded.

[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-0                                           1/1     Running       1 (3h6m ago)   3h8m    10.105.1.68    ci-node10   <none>           <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-1                                           1/1     Running       1 (3h6m ago)   3h8m    10.105.1.71    ci-node10   <none>           <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-etcd-2                                           1/1     Running       0              3h8m    10.105.1.77    ci-node10   <none>           <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-0                                          2/2     Running       3 (3h6m ago)   3h8m    10.105.1.66    ci-node10   <none>           <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-1                                          2/2     Running       3 (3h6m ago)   3h8m    10.105.1.63    ci-node10   <none>           <none>
[2024-08-15T21:07:54.947Z] mdk-836-n-kafka-2                                          2/2     Running       1 (3h6m ago)   3h8m    10.105.1.80    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-kafka-exporter-6f984b5cbb-r5mld                  1/1     Running       5 (3h6m ago)   3h8m    10.105.1.28    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datacoord-569765964-xzq97                 1/1     Running       16 (52m ago)   3h8m    10.105.7.29    ci-node12   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datanode-f8ccd8c6d-2bc2r                  1/1     Running       8 (51m ago)    3h8m    10.105.1.27    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-datanode-f8ccd8c6d-vrqkl                  1/1     Running       16 (52m ago)   3h8m    10.105.7.26    ci-node12   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexcoord-5984cbd7bf-2hxds               1/1     Running       0              3h8m    10.105.1.37    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexnode-6d6bf7d587-kqw4q                1/1     Running       7 (51m ago)    3h8m    10.105.1.31    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-indexnode-6d6bf7d587-xsnrt                1/1     Running       16 (52m ago)   3h8m    10.105.7.32    ci-node12   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-proxy-84d68d4ccb-mmlmj                    1/1     Running       16 (52m ago)   3h8m    10.105.7.30    ci-node12   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-proxy-84d68d4ccb-qs2bs                    1/1     Running       8 (51m ago)    3h8m    10.105.1.29    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querycoord-7ff8c85f58-x5648               1/1     Running       8 (51m ago)    3h8m    10.105.1.20    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querynode-68f986bc7-2ggg6                 1/1     Running       7 (51m ago)    3h8m    10.105.1.19    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-querynode-68f986bc7-ngw6z                 1/1     Running       8 (51m ago)    3h8m    10.105.1.36    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-milvus-rootcoord-548cd959d9-z7nqx                1/1     Running       16 (52m ago)   3h8m    10.105.7.28    ci-node12   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-minio-6466b6c7c9-5jrjc                           1/1     Running       0              3h8m    10.105.1.61    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-0                                      1/1     Running       0              3h8m    10.105.1.76    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-1                                      1/1     Running       0              3h8m    10.105.1.73    ci-node10   <none>           <none>
[2024-08-15T21:07:54.948Z] mdk-836-n-zookeeper-2                                      1/1     Running       0              3h8m    10.105.1.81    ci-node10   <none>           <none>
[2024-08-15T21:07:55.509Z] mdk-836-n-etcd-0 restarts 1, last terminateed reason is "Error"
[2024-08-15T21:07:55.765Z] mdk-836-n-etcd-1 restarts 1, last terminateed reason is "Error"
[2024-08-15T21:07:55.765Z] mdk-836-n-kafka-0 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.020Z] mdk-836-n-kafka-1 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.275Z] mdk-836-n-kafka-2 restarts 0, last terminateed reason is null
[2024-08-15T21:07:56.275Z] mdk-836-n-kafka-exporter-6f984b5cbb-r5mld restarts 5, last terminateed reason is "Error"
[2024-08-15T21:07:56.531Z] mdk-836-n-milvus-datacoord-569765964-xzq97 restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:56.531Z] mdk-836-n-milvus-datanode-f8ccd8c6d-2bc2r restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:56.786Z] mdk-836-n-milvus-datanode-f8ccd8c6d-vrqkl restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:56.786Z] mdk-836-n-milvus-indexnode-6d6bf7d587-kqw4q restarts 7, last terminateed reason is "Error"
[2024-08-15T21:07:57.043Z] mdk-836-n-milvus-indexnode-6d6bf7d587-xsnrt restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:57.043Z] mdk-836-n-milvus-proxy-84d68d4ccb-mmlmj restarts 16, last terminateed reason is "Error"
[2024-08-15T21:07:57.299Z] mdk-836-n-milvus-proxy-84d68d4ccb-qs2bs restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.299Z] mdk-836-n-milvus-querycoord-7ff8c85f58-x5648 restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.555Z] mdk-836-n-milvus-querynode-68f986bc7-2ggg6 restarts 7, last terminateed reason is "Error"
[2024-08-15T21:07:57.555Z] mdk-836-n-milvus-querynode-68f986bc7-ngw6z restarts 8, last terminateed reason is "Error"
[2024-08-15T21:07:57.811Z] mdk-836-n-milvus-rootcoord-548cd959d9-z7nqx restarts 16, last terminateed reason is "Error"

Expected Behavior

pass

Steps To Reproduce

No response

Milvus Log

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI/detail/master/836/pipeline/207/
  2. log: artifacts-milvus-distributed-kafka-nightly-836-pymilvus-e2e-logs.tar.gz
  3. panic log:
2024-08-16T02:01:01.263565755+08:00 stderr F panic: context deadline exceeded
2024-08-16T02:01:01.263604535+08:00 stderr F
2024-08-16T02:01:01.263615515+08:00 stderr F goroutine 220 [running]:
2024-08-16T02:01:01.263624502+08:00 stderr F panic({0x55c09a0?, 0x895f5a0?})
2024-08-16T02:01:01.263655335+08:00 stderr F    /usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc000a4bf70 sp=0xc000a4bec0 pc=0x1ec1bac
2024-08-16T02:01:01.263665885+08:00 stderr F github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
2024-08-16T02:01:01.263697376+08:00 stderr F    /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:122 +0x108 fp=0xc000a4bfe0 sp=0xc000a4bf70 pc=0x4f32788
2024-08-16T02:01:01.263724189+08:00 stderr F runtime.goexit()
2024-08-16T02:01:01.263742299+08:00 stderr F    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000a4bfe8 sp=0xc000a4bfe0 pc=0x1efb401
2024-08-16T02:01:01.263758963+08:00 stderr F created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] in goroutine 1
2024-08-16T02:01:01.263768563+08:00 stderr F    /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:113 +0x138
2024-08-16T02:01:01.263776306+08:00 stderr F
2024-08-16T02:01:01.263783683+08:00 stderr F goroutine 1 [semacquire]:
2024-08-16T02:01:01.263790956+08:00 stderr F runtime.gopark(0x110?, 0x56c8b20?, 0x20?, 0x30?, 0x7f38f147de78?)
2024-08-16T02:01:01.263814317+08:00 stderr F    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0013df588 sp=0xc0013df568 pc=0x1ec5bce

Anything else?

No response

NicoYuan1986 avatar Aug 16 '24 06:08 NicoYuan1986