[Bug]: [benchmark] insert 1b data, and concurrent load, query, search, error: "role querycoord[nodeID: 16] is not serving, reason: Initializing"
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.2.0-20230410-d845175f
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Insert 1b data and perform concurrent load, query, and search stability tests .
load_collection failed "role querycoord[nodeID: 16] is not serving, reason: Initializing" search request check failed: "fail to search on all shard leaders, err=All attempts results:"
case: test_concurrent_locust_1b_ivf_sq8_ddl_dql_cluster argo task : fouramf-m9jfp
server : querycoord restart
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-m9jfp-84-3240-etcd-0 1/1 Running 0 23h 10.104.1.109 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-etcd-1 1/1 Running 0 23h 10.104.4.34 4am-node11 <none> <none>
fouramf-m9jfp-84-3240-etcd-2 1/1 Running 0 23h 10.104.6.162 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-milvus-datacoord-69bdbd8cb5-vdqd2 1/1 Running 0 23h 10.104.6.152 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-milvus-datanode-86b4bb98d8-2t4p4 1/1 Running 0 23h 10.104.1.93 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-milvus-indexcoord-765d8d94cd-llpsd 1/1 Running 0 23h 10.104.1.98 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-milvus-indexnode-c9fc96df-rsxsg 1/1 Running 0 23h 10.104.9.97 4am-node14 <none> <none>
fouramf-m9jfp-84-3240-milvus-proxy-5d6f68c64b-7s879 1/1 Running 0 23h 10.104.1.96 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-milvus-querycoord-5589c994c7-smt5w 1/1 Running 3 (5h26m ago) 23h 10.104.1.91 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-7f5xc 1/1 Running 0 23h 10.104.4.32 4am-node11 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-887cn 1/1 Running 0 23h 10.104.1.100 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-9vv67 1/1 Running 0 23h 10.104.9.98 4am-node14 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-c7zs9 1/1 Running 0 23h 10.104.5.162 4am-node12 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-jskkz 1/1 Running 0 23h 10.104.6.153 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-trqlp 1/1 Running 0 23h 10.104.5.161 4am-node12 <none> <none>
fouramf-m9jfp-84-3240-milvus-rootcoord-88669cc45-k4mf6 1/1 Running 0 23h 10.104.1.94 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-minio-0 1/1 Running 0 23h 10.104.1.110 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-minio-1 1/1 Running 0 23h 10.104.5.164 4am-node12 <none> <none>
fouramf-m9jfp-84-3240-minio-2 1/1 Running 0 23h 10.104.9.100 4am-node14 <none> <none>
fouramf-m9jfp-84-3240-minio-3 1/1 Running 0 23h 10.104.6.165 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-pulsar-bookie-0 1/1 Running 0 23h 10.104.6.163 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-pulsar-bookie-1 1/1 Running 0 23h 10.104.1.113 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-bookie-2 1/1 Running 0 23h 10.104.9.103 4am-node14 <none> <none>
fouramf-m9jfp-84-3240-pulsar-bookie-init-9wq4v 0/1 Completed 0 23h 10.104.1.95 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-broker-0 1/1 Running 0 23h 10.104.1.103 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-proxy-0 1/1 Running 0 23h 10.104.1.92 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-pulsar-init-549q9 0/1 Completed 0 23h 10.104.1.99 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-recovery-0 1/1 Running 0 23h 10.104.1.102 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-0 1/1 Running 0 23h 10.104.1.108 4am-node10 <none> <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-1 1/1 Running 0 23h 10.104.6.167 4am-node13 <none> <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-2 1/1 Running 0 23h 10.104.4.36 4am-node11 <none> <none>
client log: fouramf-m9jfp_59709.zip
client error log:

querycoord grafana:

Expected Behavior
No response
Steps To Reproduce
1. create a collection or use an existing collection
2. build index on vector column
3. insert a certain number of vectors
4. flush collection
5. build index on vector column with the same parameters
6. build index on on scalars column or not
7. count the total number of rows
8. load collection
9. perform concurrent operations (query,load,search)
10. clean all collections or not
Milvus Log
No response
Anything else?
No response
panic issue of recent balance algorithm modification @weiliu1031, is it fixed? /assign @weiliu1031
/assign @elstic this has been fixed with #23334
/unassign
/assign @elstic this has been fixed with #23334
After verification, this issue has been fixed, querynode memory usage balancing. verification version: 2.2.0-20230418-e1122c2a.