milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark] insert 1b data, and concurrent load, query, search, error: "role querycoord[nodeID: 16] is not serving, reason: Initializing"

Open elstic opened this issue 3 years ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230410-d845175f
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Insert 1b data and perform concurrent load, query, and search stability tests .

load_collection failed "role querycoord[nodeID: 16] is not serving, reason: Initializing" search request check failed: "fail to search on all shard leaders, err=All attempts results:"

case: test_concurrent_locust_1b_ivf_sq8_ddl_dql_cluster argo task : fouramf-m9jfp

server : querycoord restart

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-m9jfp-84-3240-etcd-0                                      1/1     Running     0               23h     10.104.1.109    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-etcd-1                                      1/1     Running     0               23h     10.104.4.34     4am-node11   <none>           <none>
fouramf-m9jfp-84-3240-etcd-2                                      1/1     Running     0               23h     10.104.6.162    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-milvus-datacoord-69bdbd8cb5-vdqd2           1/1     Running     0               23h     10.104.6.152    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-milvus-datanode-86b4bb98d8-2t4p4            1/1     Running     0               23h     10.104.1.93     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-milvus-indexcoord-765d8d94cd-llpsd          1/1     Running     0               23h     10.104.1.98     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-milvus-indexnode-c9fc96df-rsxsg             1/1     Running     0               23h     10.104.9.97     4am-node14   <none>           <none>
fouramf-m9jfp-84-3240-milvus-proxy-5d6f68c64b-7s879               1/1     Running     0               23h     10.104.1.96     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querycoord-5589c994c7-smt5w          1/1     Running     3 (5h26m ago)   23h     10.104.1.91     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-7f5xc           1/1     Running     0               23h     10.104.4.32     4am-node11   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-887cn           1/1     Running     0               23h     10.104.1.100    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-9vv67           1/1     Running     0               23h     10.104.9.98     4am-node14   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-c7zs9           1/1     Running     0               23h     10.104.5.162    4am-node12   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-jskkz           1/1     Running     0               23h     10.104.6.153    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-milvus-querynode-577f9b8cd5-trqlp           1/1     Running     0               23h     10.104.5.161    4am-node12   <none>           <none>
fouramf-m9jfp-84-3240-milvus-rootcoord-88669cc45-k4mf6            1/1     Running     0               23h     10.104.1.94     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-minio-0                                     1/1     Running     0               23h     10.104.1.110    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-minio-1                                     1/1     Running     0               23h     10.104.5.164    4am-node12   <none>           <none>
fouramf-m9jfp-84-3240-minio-2                                     1/1     Running     0               23h     10.104.9.100    4am-node14   <none>           <none>
fouramf-m9jfp-84-3240-minio-3                                     1/1     Running     0               23h     10.104.6.165    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-bookie-0                             1/1     Running     0               23h     10.104.6.163    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-bookie-1                             1/1     Running     0               23h     10.104.1.113    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-bookie-2                             1/1     Running     0               23h     10.104.9.103    4am-node14   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-bookie-init-9wq4v                    0/1     Completed   0               23h     10.104.1.95     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-broker-0                             1/1     Running     0               23h     10.104.1.103    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-proxy-0                              1/1     Running     0               23h     10.104.1.92     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-pulsar-init-549q9                    0/1     Completed   0               23h     10.104.1.99     4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-recovery-0                           1/1     Running     0               23h     10.104.1.102    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-0                          1/1     Running     0               23h     10.104.1.108    4am-node10   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-1                          1/1     Running     0               23h     10.104.6.167    4am-node13   <none>           <none>
fouramf-m9jfp-84-3240-pulsar-zookeeper-2                          1/1     Running     0               23h     10.104.4.36     4am-node11   <none>           <none>

client log: fouramf-m9jfp_59709.zip

client error log: image

querycoord grafana: image

Expected Behavior

No response

Steps To Reproduce

1. create a collection or use an existing collection
        2. build index on vector column
        3. insert a certain number of vectors
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection
        9. perform concurrent operations (query,load,search)
        10. clean all collections or not

Milvus Log

No response

Anything else?

No response

elstic avatar Apr 12 '23 03:04 elstic

panic issue of recent balance algorithm modification @weiliu1031, is it fixed? /assign @weiliu1031

yah01 avatar Apr 12 '23 03:04 yah01

/assign @elstic this has been fixed with #23334

yah01 avatar Apr 12 '23 04:04 yah01

/unassign

weiliu1031 avatar Apr 17 '23 02:04 weiliu1031

/assign @elstic this has been fixed with #23334

After verification, this issue has been fixed, querynode memory usage balancing. verification version: 2.2.0-20230418-e1122c2a.

elstic avatar Apr 21 '23 02:04 elstic