milvus
milvus copied to clipboard
[Bug]: [benchmark] milvus panic reported an error `query coordinator id allocator initialize failed`
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.4-20240424-e1b4ef74
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task : new-stable-master-1713981600, id : 2
SCANN index inserted 1 million data loads and then continued concurrent reads and writes, milvus panic and disconnected from etcd causing milvus to reboot.
server
new-stable-mast81600-2-66-2426-etcd-0 1/1 Running 0 5m40s 10.104.27.44 4am-node31 <none> <none>
new-stable-mast81600-2-66-2426-milvus-standalone-7d6f45c9fhtmtn 1/1 Running 1 (117s ago) 5m40s 10.104.20.17 4am-node22 <none> <none>
new-stable-mast81600-2-66-2426-minio-58c486f9c8-ksxv7 1/1 Running 0 5m40s 10.104.27.52 4am-node31 <none> <none> (base.py:257)
[2024-04-24 23:12:51,184 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|new-stable-mast81600-2-66-2426-milvus|new-stable-mast81600-2-66-2426-minio|new-stable-mast81600-2-66-2426-etcd|new-stable-mast81600-2-66-2426-pulsar|new-stable-mast81600-2-66-2426-zookeeper|new-stable-mast81600-2-66-2426-kafka|new-stable-mast81600-2-66-2426-log|new-stable-mast81600-2-66-2426-tikv' (util_cmd.py:14)
[2024-04-24 23:13:01,232 - INFO - fouram]: [CliClient] pod details of release(new-stable-mast81600-2-66-2426):
I0424 23:12:52.480523 451 request.go:665] Waited for 1.154773944s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/storage.k8s.io/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
new-stable-mast81600-2-66-2426-etcd-0 1/1 Running 0 5h10m 10.104.27.44 4am-node31 <none> <none>
new-stable-mast81600-2-66-2426-milvus-standalone-7d6f45c9fhtmtn 1/1 Running 3 (3h32m ago) 5h10m 10.104.20.17 4am-node22 <none> <none>
new-stable-mast81600-2-66-2426-minio-58c486f9c8-ksxv7 1/1 Running 0 5h10m 10.104.27.52 4am-node31 <none> <none>
client pod : new-stable-master-1713981600-1576406165
client log
panic :
disconnected from etcd
Expected Behavior
No response
Steps To Reproduce
1. create a collection
2. build an SCANN index on the vector column
3. insert 1m vectors
4. flush collection
5. build index on vector column with the same parameters
6. count the total number of rows
7. load collection
8. execute concurrent search, query, flush, insert ,delete
9. step 8 lasts 5h
Milvus Log
No response
Anything else?
test env: 4am cluster
/assign @congqixia /unassign
From the log above, some components failed to connect to etcd. Most likely it's a env problem
The issue is fixed, I'll turn it off