milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark] Some load timeout failures during concurrent `DML` testing

Open elstic opened this issue 9 months ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:master-20240516-5b27a0cd 
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task : fouram-disk-stab-1715882400, id : 3 case: test_concurrent_locust_diskann_compaction_cluster

After inserting 100,000 data into milvus and concurrently load, search, query, insert, delete, and flush for 5h, there were 179 load failures.

   'load': {'Requests': 53374,
            'Fails': 179,
            'RPS': 2.97,
            'fail_s': 0.0,
            'RT_max': 30219.15,
            'RT_avg': 1293.95,
            'TP50': 220.0,
            'TP99': 22000.0},

client error log: image

server:

fouram-disk-sta82400-3-87-9477-etcd-0                             1/1     Running       0               5m25s   10.104.18.119   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-1                             1/1     Running       0               5m25s   10.104.34.50    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-2                             1/1     Running       0               5m24s   10.104.25.235   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt   1/1     Running       3 (4m29s ago)   5m25s   10.104.25.226   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5   1/1     Running       3 (4m29s ago)   5m25s   10.104.33.154   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x   1/1     Running       0               5m25s   10.104.25.224   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c   1/1     Running       3 (4m33s ago)   5m25s   10.104.32.142   4am-node39   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7      1/1     Running       3 (4m27s ago)   5m25s   10.104.25.225   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8   1/1     Running       3 (4m31s ago)   5m25s   10.104.25.223   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m   1/1     Running       3 (4m31s ago)   5m25s   10.104.19.95    4am-node28   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5   1/1     Running       3 (4m27s ago)   5m24s   10.104.25.227   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-0                            1/1     Running       0               5m25s   10.104.18.111   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-1                            1/1     Running       0               5m25s   10.104.34.52    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-2                            1/1     Running       0               5m24s   10.104.25.239   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-3                            1/1     Running       0               5m24s   10.104.33.160   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0                    1/1     Running       0               5m25s   10.104.25.233   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1                    1/1     Running       0               5m24s   10.104.34.53    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2                    1/1     Running       0               5m24s   10.104.18.124   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp           0/1     Completed     0               5m25s   10.104.5.186    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0                    1/1     Running       0               5m25s   10.104.4.20     4am-node11   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0                     1/1     Running       0               5m25s   10.104.5.185    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897           0/1     Completed     0               5m25s   10.104.5.184    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0                  1/1     Running       0               5m24s   10.104.5.187    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0                 1/1     Running       0               5m25s   10.104.34.47    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1                 1/1     Running       0               4m35s   10.104.23.61    4am-node27   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2                 1/1     Running       0               3m19s   10.104.19.111   4am-node28   <none>           <none> (base.py:257)
[2024-05-16 23:13:10,730 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|fouram-disk-sta82400-3-87-9477-milvus|fouram-disk-sta82400-3-87-9477-minio|fouram-disk-sta82400-3-87-9477-etcd|fouram-disk-sta82400-3-87-9477-pulsar|fouram-disk-sta82400-3-87-9477-zookeeper|fouram-disk-sta82400-3-87-9477-kafka|fouram-disk-sta82400-3-87-9477-log|fouram-disk-sta82400-3-87-9477-tikv'  (util_cmd.py:14)
[2024-05-16 23:13:21,029 -  INFO - fouram]: [CliClient] pod details of release(fouram-disk-sta82400-3-87-9477): 
 I0516 23:13:12.374287    3548 request.go:665] Waited for 1.19762423s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/discovery.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouram-disk-sta82400-3-87-9477-etcd-0                             1/1     Running            0               5h7m    10.104.18.119   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-1                             1/1     Running            0               5h7m    10.104.34.50    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-etcd-2                             1/1     Running            0               5h7m    10.104.25.235   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.226   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5   1/1     Running            3 (5h6m ago)    5h7m    10.104.33.154   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x   1/1     Running            0               5h7m    10.104.25.224   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c   1/1     Running            3 (5h6m ago)    5h7m    10.104.32.142   4am-node39   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7      1/1     Running            3 (5h6m ago)    5h7m    10.104.25.225   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.223   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m   1/1     Running            3 (5h6m ago)    5h7m    10.104.19.95    4am-node28   <none>           <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5   1/1     Running            3 (5h6m ago)    5h7m    10.104.25.227   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-0                            1/1     Running            0               5h7m    10.104.18.111   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-1                            1/1     Running            0               5h7m    10.104.34.52    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-2                            1/1     Running            0               5h7m    10.104.25.239   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-minio-3                            1/1     Running            0               5h7m    10.104.33.160   4am-node36   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0                    1/1     Running            0               5h7m    10.104.25.233   4am-node30   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1                    1/1     Running            0               5h7m    10.104.34.53    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2                    1/1     Running            0               5h7m    10.104.18.124   4am-node25   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp           0/1     Completed          0               5h7m    10.104.5.186    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0                    1/1     Running            0               5h7m    10.104.4.20     4am-node11   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0                     1/1     Running            0               5h7m    10.104.5.185    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897           0/1     Completed          0               5h7m    10.104.5.184    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0                  1/1     Running            0               5h7m    10.104.5.187    4am-node12   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0                 1/1     Running            0               5h7m    10.104.34.47    4am-node37   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1                 1/1     Running            0               5h6m    10.104.23.61    4am-node27   <none>           <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2                 1/1     Running            0               5h4m    10.104.19.111   4am-node28   <none>           <none>

Expected Behavior

no load fail

Steps To Reproduce

1. create a collection  
  2. build an DiskANN index on the vector column
  3. insert 100k vectors
  4. flush collection
  5. build index on vector column with the same parameters  
  6. count the total number of rows
  7. load collection
  8. execute concurrent search, query, flush, insert ,delete,load 
  9. step 8 lasts 5h

Milvus Log

No response

Anything else?

No response

elstic avatar May 17 '24 06:05 elstic