milvus
milvus copied to clipboard
[Bug]: [benchmark] Some load timeout failures during concurrent `DML` testing
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240516-5b27a0cd
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task : fouram-disk-stab-1715882400, id : 3 case: test_concurrent_locust_diskann_compaction_cluster
After inserting 100,000 data into milvus and concurrently load, search, query, insert, delete, and flush
for 5h, there were 179 load failures.
'load': {'Requests': 53374,
'Fails': 179,
'RPS': 2.97,
'fail_s': 0.0,
'RT_max': 30219.15,
'RT_avg': 1293.95,
'TP50': 220.0,
'TP99': 22000.0},
client error log:
server:
fouram-disk-sta82400-3-87-9477-etcd-0 1/1 Running 0 5m25s 10.104.18.119 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-etcd-1 1/1 Running 0 5m25s 10.104.34.50 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-etcd-2 1/1 Running 0 5m24s 10.104.25.235 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt 1/1 Running 3 (4m29s ago) 5m25s 10.104.25.226 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5 1/1 Running 3 (4m29s ago) 5m25s 10.104.33.154 4am-node36 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x 1/1 Running 0 5m25s 10.104.25.224 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c 1/1 Running 3 (4m33s ago) 5m25s 10.104.32.142 4am-node39 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7 1/1 Running 3 (4m27s ago) 5m25s 10.104.25.225 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8 1/1 Running 3 (4m31s ago) 5m25s 10.104.25.223 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m 1/1 Running 3 (4m31s ago) 5m25s 10.104.19.95 4am-node28 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5 1/1 Running 3 (4m27s ago) 5m24s 10.104.25.227 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-0 1/1 Running 0 5m25s 10.104.18.111 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-1 1/1 Running 0 5m25s 10.104.34.52 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-2 1/1 Running 0 5m24s 10.104.25.239 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-3 1/1 Running 0 5m24s 10.104.33.160 4am-node36 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0 1/1 Running 0 5m25s 10.104.25.233 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1 1/1 Running 0 5m24s 10.104.34.53 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2 1/1 Running 0 5m24s 10.104.18.124 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp 0/1 Completed 0 5m25s 10.104.5.186 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0 1/1 Running 0 5m25s 10.104.4.20 4am-node11 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0 1/1 Running 0 5m25s 10.104.5.185 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897 0/1 Completed 0 5m25s 10.104.5.184 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0 1/1 Running 0 5m24s 10.104.5.187 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0 1/1 Running 0 5m25s 10.104.34.47 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1 1/1 Running 0 4m35s 10.104.23.61 4am-node27 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2 1/1 Running 0 3m19s 10.104.19.111 4am-node28 <none> <none> (base.py:257)
[2024-05-16 23:13:10,730 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|fouram-disk-sta82400-3-87-9477-milvus|fouram-disk-sta82400-3-87-9477-minio|fouram-disk-sta82400-3-87-9477-etcd|fouram-disk-sta82400-3-87-9477-pulsar|fouram-disk-sta82400-3-87-9477-zookeeper|fouram-disk-sta82400-3-87-9477-kafka|fouram-disk-sta82400-3-87-9477-log|fouram-disk-sta82400-3-87-9477-tikv' (util_cmd.py:14)
[2024-05-16 23:13:21,029 - INFO - fouram]: [CliClient] pod details of release(fouram-disk-sta82400-3-87-9477):
I0516 23:13:12.374287 3548 request.go:665] Waited for 1.19762423s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/discovery.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouram-disk-sta82400-3-87-9477-etcd-0 1/1 Running 0 5h7m 10.104.18.119 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-etcd-1 1/1 Running 0 5h7m 10.104.34.50 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-etcd-2 1/1 Running 0 5h7m 10.104.25.235 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-datacoord-86b579c78cjkmdt 1/1 Running 3 (5h6m ago) 5h7m 10.104.25.226 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-datanode-66f87d6754-npzh5 1/1 Running 3 (5h6m ago) 5h7m 10.104.33.154 4am-node36 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-indexcoord-6586cfc7cmsz4x 1/1 Running 0 5h7m 10.104.25.224 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-indexnode-fb6f9cd59-v9w2c 1/1 Running 3 (5h6m ago) 5h7m 10.104.32.142 4am-node39 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-proxy-6767596f66-2rlt7 1/1 Running 3 (5h6m ago) 5h7m 10.104.25.225 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-querycoord-78cbb4b67lngm8 1/1 Running 3 (5h6m ago) 5h7m 10.104.25.223 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-querynode-746c5fcf9ck7l7m 1/1 Running 3 (5h6m ago) 5h7m 10.104.19.95 4am-node28 <none> <none>
fouram-disk-sta82400-3-87-9477-milvus-rootcoord-59d559d75-48nb5 1/1 Running 3 (5h6m ago) 5h7m 10.104.25.227 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-0 1/1 Running 0 5h7m 10.104.18.111 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-1 1/1 Running 0 5h7m 10.104.34.52 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-2 1/1 Running 0 5h7m 10.104.25.239 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-minio-3 1/1 Running 0 5h7m 10.104.33.160 4am-node36 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-0 1/1 Running 0 5h7m 10.104.25.233 4am-node30 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-1 1/1 Running 0 5h7m 10.104.34.53 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-2 1/1 Running 0 5h7m 10.104.18.124 4am-node25 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-bookie-init-vv9jp 0/1 Completed 0 5h7m 10.104.5.186 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-broker-0 1/1 Running 0 5h7m 10.104.4.20 4am-node11 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-proxy-0 1/1 Running 0 5h7m 10.104.5.185 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-pulsar-init-7d897 0/1 Completed 0 5h7m 10.104.5.184 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-recovery-0 1/1 Running 0 5h7m 10.104.5.187 4am-node12 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-0 1/1 Running 0 5h7m 10.104.34.47 4am-node37 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-1 1/1 Running 0 5h6m 10.104.23.61 4am-node27 <none> <none>
fouram-disk-sta82400-3-87-9477-pulsar-zookeeper-2 1/1 Running 0 5h4m 10.104.19.111 4am-node28 <none> <none>
Expected Behavior
no load fail
Steps To Reproduce
1. create a collection
2. build an DiskANN index on the vector column
3. insert 100k vectors
4. flush collection
5. build index on vector column with the same parameters
6. count the total number of rows
7. load collection
8. execute concurrent search, query, flush, insert ,delete,load
9. step 8 lasts 5h
Milvus Log
No response
Anything else?
No response