milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark] Milvus search failed and report error:"fail to search on all shard leaders"

Open elstic opened this issue 1 year ago • 7 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230308-69f4afe4
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.0.dev45
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task : fouramf-stable-1678302000 , id : 10

case_name: test_concurrent_locust_diskann_compaction_cluster querynode reboots multiple times , server:

fouramf-stable-1678302000-10-etcd-0                               1/1     Running            0               5h7m   10.104.5.94     4am-node12   <none>           <none>
fouramf-stable-1678302000-10-etcd-1                               1/1     Running            0               5h7m   10.104.9.172    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-etcd-2                               1/1     Running            0               5h7m   10.104.4.209    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-milvus-datacoord-6955d76b7d-hqfx6    1/1     Running            1 (5h3m ago)    5h7m   10.104.4.193    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-milvus-datanode-d5fcc6fd5-hlnlz      1/1     Running            1 (5h3m ago)    5h7m   10.104.1.175    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-milvus-indexcoord-7c45c746b9-m7zj8   1/1     Running            1 (5h3m ago)    5h7m   10.104.9.147    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-milvus-indexnode-89ff9dd98-p8fwf     1/1     Running            0               5h7m   10.104.5.83     4am-node12   <none>           <none>
fouramf-stable-1678302000-10-milvus-proxy-6958fdc6cc-k6gbz        1/1     Running            1 (5h3m ago)    5h7m   10.104.9.148    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-milvus-querycoord-74489bf886-ncnph   1/1     Running            1 (5h3m ago)    5h7m   10.104.1.174    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-milvus-querynode-669c768656-27sn4    1/1     Running            3 (4h58m ago)   5h7m   10.104.9.149    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-milvus-rootcoord-695dd989f4-9z6x9    1/1     Running            1 (5h3m ago)    5h7m   10.104.9.151    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-minio-0                              1/1     Running            0               5h7m   10.104.9.171    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-minio-1                              1/1     Running            0               5h7m   10.104.4.198    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-minio-2                              1/1     Running            0               5h7m   10.104.1.183    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-minio-3                              1/1     Running            0               5h7m   10.104.5.96     4am-node12   <none>           <none>
fouramf-stable-1678302000-10-pulsar-bookie-0                      1/1     Running            0               5h7m   10.104.4.208    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-pulsar-bookie-1                      1/1     Running            0               5h7m   10.104.1.186    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-pulsar-bookie-2                      1/1     Running            0               5h7m   10.104.9.177    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-pulsar-bookie-init-j6jkq             0/1     Completed          0               5h7m   10.104.9.150    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-pulsar-broker-0                      1/1     Running            0               5h7m   10.104.1.177    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-pulsar-proxy-0                       1/1     Running            0               5h7m   10.104.4.191    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-pulsar-pulsar-init-4fzfj             0/1     Completed          0               5h7m   10.104.4.192    4am-node11   <none>           <none>
fouramf-stable-1678302000-10-pulsar-recovery-0                    1/1     Running            0               5h7m   10.104.9.152    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-pulsar-zookeeper-0                   1/1     Running            0               5h7m   10.104.9.170    4am-node14   <none>           <none>
fouramf-stable-1678302000-10-pulsar-zookeeper-1                   1/1     Running            0               5h6m   10.104.1.203    4am-node10   <none>           <none>
fouramf-stable-1678302000-10-pulsar-zookeeper-2                   1/1     Running            0               5h5m   10.104.4.223    4am-node11   <none>           <none>

client pod:fouramf-stable-1678302000-2776414840 client error: image

Expected Behavior

No response

Steps To Reproduce

1. create a collection or use an existing collection
        2. build index on vector column
        3. insert a certain number of vectors
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection
        9. perform concurrent operations
        10. clean all collections or not

Milvus Log

No response

Anything else?

No response

elstic avatar Mar 09 '23 04:03 elstic

argo task: fouramf-stable-1678302000 , id : 6

case name: test_concurrent_locust_diskann_dml_dql_filter_cluster server:

fouramf-stable-1678302000-6-etcd-0                                1/1     Running            0               5h7m   10.104.5.109    4am-node12   <none>           <none>
fouramf-stable-1678302000-6-etcd-1                                1/1     Running            0               5h7m   10.104.6.225    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-etcd-2                                1/1     Running            0               5h7m   10.104.4.217    4am-node11   <none>           <none>
fouramf-stable-1678302000-6-milvus-datacoord-76bc98b585-whwg2     1/1     Running            1 (5h3m ago)    5h7m   10.104.5.87     4am-node12   <none>           <none>
fouramf-stable-1678302000-6-milvus-datanode-748bc4489c-bz7wk      1/1     Running            1 (5h3m ago)    5h7m   10.104.4.195    4am-node11   <none>           <none>
fouramf-stable-1678302000-6-milvus-indexcoord-7f59d66b7f-6chkt    1/1     Running            1 (5h3m ago)    5h7m   10.104.6.212    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-milvus-indexnode-8645d599b-6q87p      1/1     Running            0               5h7m   10.104.9.161    4am-node14   <none>           <none>
fouramf-stable-1678302000-6-milvus-proxy-5f45674595-4w8p2         1/1     Running            1 (5h3m ago)    5h7m   10.104.6.213    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-milvus-querycoord-79556d7988-7r2zz    1/1     Running            1 (5h3m ago)    5h7m   10.104.5.88     4am-node12   <none>           <none>
fouramf-stable-1678302000-6-milvus-querynode-58c996d994-f8f4n     1/1     Running            1 (5h ago)      5h7m   10.104.6.214    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-milvus-rootcoord-65449775d6-q9zpj     1/1     Running            1 (5h3m ago)    5h7m   10.104.5.84     4am-node12   <none>           <none>
fouramf-stable-1678302000-6-minio-0                               1/1     Running            0               5h7m   10.104.5.104    4am-node12   <none>           <none>
fouramf-stable-1678302000-6-minio-1                               1/1     Running            0               5h7m   10.104.1.193    4am-node10   <none>           <none>
fouramf-stable-1678302000-6-minio-2                               1/1     Running            0               5h7m   10.104.6.223    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-minio-3                               1/1     Running            0               5h7m   10.104.4.216    4am-node11   <none>           <none>
fouramf-stable-1678302000-6-pulsar-bookie-0                       1/1     Running            0               5h7m   10.104.1.200    4am-node10   <none>           <none>
fouramf-stable-1678302000-6-pulsar-bookie-1                       1/1     Running            0               5h7m   10.104.5.113    4am-node12   <none>           <none>
fouramf-stable-1678302000-6-pulsar-bookie-2                       1/1     Running            0               5h7m   10.104.6.228    4am-node13   <none>           <none>
fouramf-stable-1678302000-6-pulsar-bookie-init-cb6c8              0/1     Completed          0               5h7m   10.104.1.178    4am-node10   <none>           <none>
fouramf-stable-1678302000-6-pulsar-broker-0                       1/1     Running            0               5h7m   10.104.1.181    4am-node10   <none>           <none>
fouramf-stable-1678302000-6-pulsar-proxy-0                        1/1     Running            0               5h7m   10.104.5.89     4am-node12   <none>           <none>
fouramf-stable-1678302000-6-pulsar-pulsar-init-lhxlx              0/1     Completed          0               5h7m   10.104.9.162    4am-node14   <none>           <none>
fouramf-stable-1678302000-6-pulsar-recovery-0                     1/1     Running            0               5h7m   10.104.5.90     4am-node12   <none>           <none>
fouramf-stable-1678302000-6-pulsar-zookeeper-0                    1/1     Running            0               5h7m   10.104.5.108    4am-node12   <none>           <none>
fouramf-stable-1678302000-6-pulsar-zookeeper-1                    1/1     Running            0               5h5m   10.104.4.221    4am-node11   <none>           <none>
fouramf-stable-1678302000-6-pulsar-zookeeper-2                    1/1     Running            0               5h4m   10.104.1.207    4am-node10   <none>           <none>

client error : image

Steps To Reproduce

       1. create a collection or use an existing collection
        2. build index on vector column
        3. insert a certain number of vectors
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection
        9. perform concurrent operations
        10. clean all collections or not

elstic avatar Mar 09 '23 05:03 elstic

/assign @jiaoew1991 /unassign

yanliang567 avatar Mar 10 '23 03:03 yanliang567

/assign @aoiasd /unassign

jiaoew1991 avatar Mar 16 '23 07:03 jiaoew1991

Case 1: QueryCoord update current target, and two segments will be dropped in next target, but before we reach next target, QueryNode restart and will not reload dropped segment, so QueryCoord will failed to search because could not find this two segment till target update to next target. Case 2: Just because QueryNode not reload complete after restart.

The only question was that QueryNode crash for no reason, and we could not see any panic or c++ error log.

aoiasd avatar Mar 16 '23 10:03 aoiasd

Case 1: QueryCoord update current target, and two segment will be dropped in next target, but QueryNode restart and will not reload dropped segment, so QueryCoord will failed to search because could not find this two segment till target update to next target. Case 2: Just because QueryNode not reload complete after restart.

The only question QueryNode crash for no reason, and we could not see any panic or c++ error log.

@aoiasd @xige-16 Maybe you can watch it together

elstic avatar Mar 16 '23 11:03 elstic

This issue still exists.

fouramf-stable-1680548400-6-etcd-0                                1/1     Running            0                 5h8m    10.104.9.131    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-etcd-1                                1/1     Running            0                 5h8m    10.104.4.233    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-etcd-2                                1/1     Running            0                 5h8m    10.104.1.124    4am-node10   <none>           <none>
fouramf-stable-1680548400-6-milvus-datacoord-7d8685984c-vlbb8     1/1     Running            1 (5h4m ago)      5h8m    10.104.1.104    4am-node10   <none>           <none>
fouramf-stable-1680548400-6-milvus-datanode-8b754c95b-sxztr       1/1     Running            1 (5h4m ago)      5h8m    10.104.1.105    4am-node10   <none>           <none>
fouramf-stable-1680548400-6-milvus-indexcoord-855777b5c4-xkvwr    1/1     Running            1 (5h4m ago)      5h8m    10.104.6.83     4am-node13   <none>           <none>
fouramf-stable-1680548400-6-milvus-indexnode-874d4684c-gljpx      1/1     Running            0                 5h8m    10.104.5.36     4am-node12   <none>           <none>
fouramf-stable-1680548400-6-milvus-proxy-54fdf6d8c6-8z6m2         1/1     Running            1 (5h4m ago)      5h8m    10.104.9.126    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-milvus-querycoord-669899bd89-4nzhv    1/1     Running            1 (5h4m ago)      5h8m    10.104.9.124    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-milvus-querynode-7f5f8548c7-mxz5b     1/1     Running            1 (5h ago)        5h8m    10.104.6.85     4am-node13   <none>           <none>
fouramf-stable-1680548400-6-milvus-rootcoord-5d75c4fb55-nsxdb     1/1     Running            1 (5h4m ago)      5h8m    10.104.6.84     4am-node13   <none>           <none>
fouramf-stable-1680548400-6-minio-0                               1/1     Running            0                 5h8m    10.104.4.235    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-minio-1                               1/1     Running            0                 5h8m    10.104.9.136    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-minio-2                               1/1     Running            0                 5h8m    10.104.1.128    4am-node10   <none>           <none>
fouramf-stable-1680548400-6-minio-3                               1/1     Running            0                 5h8m    10.104.5.50     4am-node12   <none>           <none>
fouramf-stable-1680548400-6-pulsar-bookie-0                       1/1     Running            0                 5h8m    10.104.4.231    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-bookie-1                       1/1     Running            0                 5h8m    10.104.9.134    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-pulsar-bookie-2                       1/1     Running            0                 5h8m    10.104.1.127    4am-node10   <none>           <none>
fouramf-stable-1680548400-6-pulsar-bookie-init-q86vw              0/1     Completed          0                 5h8m    10.104.4.208    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-broker-0                       1/1     Running            0                 5h8m    10.104.4.206    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-proxy-0                        1/1     Running            0                 5h8m    10.104.4.207    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-pulsar-init-g2s8v              0/1     Completed          0                 5h8m    10.104.4.205    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-recovery-0                     1/1     Running            0                 5h8m    10.104.9.123    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-pulsar-zookeeper-0                    1/1     Running            0                 5h8m    10.104.4.229    4am-node11   <none>           <none>
fouramf-stable-1680548400-6-pulsar-zookeeper-1                    1/1     Running            0                 5h6m    10.104.9.154    4am-node14   <none>           <none>
fouramf-stable-1680548400-6-pulsar-zookeeper-2                    1/1     Running            0                 5h4m    10.104.4.3      4am-node11   <none>           <none>

full client log: fouram_log (1).log.zip

fouram_log (1).err.zip

elstic avatar Apr 04 '23 03:04 elstic

This issue still exists.

image: 2.2.6-20230413-d0e87113 (Expected 2.2.6 release version) case_name:test_concurrent_locust_diskann_compaction_cluster argo task : fouramf-stable-2gdbz , id : 10

server:

fouramf-stable-2gdbz-10-etcd-0                                    1/1     Running       0               5h8m    10.104.6.73    4am-node13   <none>           <none>
fouramf-stable-2gdbz-10-etcd-1                                    1/1     Running       0               5h7m    10.104.9.15    4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-etcd-2                                    1/1     Running       0               5h7m    10.104.4.158   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-milvus-datacoord-5f6c7db7db-gw228         1/1     Running       1 (5h4m ago)    5h8m    10.104.5.235   4am-node12   <none>           <none>
fouramf-stable-2gdbz-10-milvus-datanode-57db6fc569-bzlcc          1/1     Running       1 (5h4m ago)    5h8m    10.104.4.130   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-milvus-indexcoord-6df4586695-jgsj6        1/1     Running       1 (5h3m ago)    5h8m    10.104.9.245   4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-milvus-indexnode-6fbd8bd696-z9vc7         1/1     Running       0               5h8m    10.104.5.236   4am-node12   <none>           <none>
fouramf-stable-2gdbz-10-milvus-proxy-bd6d746c5-hg7hr              1/1     Running       1 (5h3m ago)    5h8m    10.104.9.250   4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-milvus-querycoord-76cdb79456-mwnqg        1/1     Running       1 (5h4m ago)    5h8m    10.104.4.131   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-milvus-querynode-85b54bdf5d-2694r         1/1     Running       1 (5h ago)      5h8m    10.104.1.87    4am-node10   <none>           <none>
fouramf-stable-2gdbz-10-milvus-rootcoord-6986664fbb-mm72v         1/1     Running       1 (5h3m ago)    5h8m    10.104.5.237   4am-node12   <none>           <none>
fouramf-stable-2gdbz-10-minio-0                                   1/1     Running       0               5h8m    10.104.4.154   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-minio-1                                   1/1     Running       0               5h8m    10.104.1.113   4am-node10   <none>           <none>
fouramf-stable-2gdbz-10-minio-2                                   1/1     Running       0               5h8m    10.104.5.8     4am-node12   <none>           <none>
fouramf-stable-2gdbz-10-minio-3                                   1/1     Running       0               5h7m    10.104.6.76    4am-node13   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-bookie-0                           1/1     Running       0               5h8m    10.104.9.13    4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-bookie-1                           1/1     Running       0               5h7m    10.104.6.75    4am-node13   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-bookie-2                           1/1     Running       0               5h7m    10.104.4.159   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-bookie-init-qwtj4                  0/1     Completed     0               5h8m    10.104.9.249   4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-broker-0                           1/1     Running       0               5h8m    10.104.6.45    4am-node13   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-proxy-0                            1/1     Running       0               5h8m    10.104.4.133   4am-node11   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-pulsar-init-cmxz4                  0/1     Completed     0               5h8m    10.104.9.247   4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-recovery-0                         1/1     Running       0               5h8m    10.104.9.248   4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-zookeeper-0                        1/1     Running       0               5h8m    10.104.9.12    4am-node14   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-zookeeper-1                        1/1     Running       0               5h5m    10.104.1.122   4am-node10   <none>           <none>
fouramf-stable-2gdbz-10-pulsar-zookeeper-2                        1/1     Running       0               5h4m    10.104.4.178   4am-node11   <none>           <none>

client error log: image

elstic avatar Apr 14 '23 02:04 elstic

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 02 '23 15:08 stale[bot]

The issue hasn't come up again . Verify the image: 2.2.0-20230803-6a20862c

elstic avatar Aug 03 '23 07:08 elstic