milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster] Query Node disconnected from etcd and restarted multiple times
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240228-095cdbed-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc36
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: inverted-corn-1709136000 test case name: test_inverted_locust_hnsw_diskann_dml_dql_cluster
server:
[2024-02-28 20:09:25,648 - INFO - fouram]: [Base] Deploy initial state:
I0228 16:10:08.885065 428 request.go:665] Waited for 1.162395988s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/batch/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-136000-8-75-4542-etcd-0 1/1 Running 0 7m3s 10.104.25.55 4am-node30 <none> <none>
inverted-corn-136000-8-75-4542-etcd-1 1/1 Running 0 7m3s 10.104.27.163 4am-node31 <none> <none>
inverted-corn-136000-8-75-4542-etcd-2 1/1 Running 0 7m2s 10.104.31.130 4am-node34 <none> <none>
inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8 1/1 Running 0 7m3s 10.104.12.59 4am-node17 <none> <none>
inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8 1/1 Running 1 (2m2s ago) 7m3s 10.104.12.60 4am-node17 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexcoord-64b67b845j6sgz 1/1 Running 0 7m3s 10.104.23.237 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-5v67d 1/1 Running 0 7m3s 10.104.14.197 4am-node18 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-jltrr 1/1 Running 0 7m3s 10.104.34.37 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-svmnb 1/1 Running 0 7m3s 10.104.24.101 4am-node29 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-vrvl8 1/1 Running 0 7m3s 10.104.4.153 4am-node11 <none> <none>
inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8 1/1 Running 1 (2m32s ago) 7m3s 10.104.5.219 4am-node12 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm 1/1 Running 1 (2m2s ago) 7m3s 10.104.14.196 4am-node18 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74 1/1 Running 0 7m2s 10.104.25.51 4am-node30 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z 1/1 Running 0 7m3s 10.104.5.220 4am-node12 <none> <none>
inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw 1/1 Running 1 (2m2s ago) 7m3s 10.104.9.82 4am-node14 <none> <none>
inverted-corn-136000-8-75-4542-minio-0 1/1 Running 0 7m3s 10.104.34.52 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-minio-1 1/1 Running 0 7m3s 10.104.29.226 4am-node35 <none> <none>
inverted-corn-136000-8-75-4542-minio-2 1/1 Running 0 7m3s 10.104.26.33 4am-node32 <none> <none>
inverted-corn-136000-8-75-4542-minio-3 1/1 Running 0 7m2s 10.104.31.131 4am-node34 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-0 1/1 Running 0 7m3s 10.104.28.173 4am-node33 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-1 1/1 Running 0 7m2s 10.104.26.34 4am-node32 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-2 1/1 Running 0 7m2s 10.104.23.9 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-init-x5qrg 0/1 Completed 0 7m3s 10.104.34.35 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-broker-0 1/1 Running 0 7m3s 10.104.30.110 4am-node38 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-proxy-0 1/1 Running 0 7m3s 10.104.23.241 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-pulsar-init-bfggc 0/1 Completed 0 7m3s 10.104.34.34 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-recovery-0 1/1 Running 0 7m3s 10.104.9.83 4am-node14 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-0 1/1 Running 0 7m3s 10.104.34.53 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-1 1/1 Running 0 4m55s 10.104.19.176 4am-node28 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-2 1/1 Running 0 4m18s 10.104.29.228 4am-node35 <none> <none> (base.py:257)
[2024-02-28 20:09:25,648 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'STATUS|inverted-corn-136000-8-75-4542-milvus|inverted-corn-136000-8-75-4542-minio|inverted-corn-136000-8-75-4542-etcd|inverted-corn-136000-8-75-4542-pulsar|inverted-corn-136000-8-75-4542-kafka|inverted-corn-136000-8-75-4542-log|inverted-corn-136000-8-75-4542-tikv' (util_cmd.py:14)
[2024-02-28 20:09:35,668 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-136000-8-75-4542):
I0228 20:09:26.900864 539 request.go:665] Waited for 1.165144189s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/autoscaling/v2beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-136000-8-75-4542-etcd-0 1/1 Running 0 4h6m 10.104.25.55 4am-node30 <none> <none>
inverted-corn-136000-8-75-4542-etcd-1 1/1 Running 0 4h6m 10.104.27.163 4am-node31 <none> <none>
inverted-corn-136000-8-75-4542-etcd-2 1/1 Running 0 4h6m 10.104.31.130 4am-node34 <none> <none>
inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8 1/1 Running 0 4h6m 10.104.12.59 4am-node17 <none> <none>
inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8 1/1 Running 1 (4h1m ago) 4h6m 10.104.12.60 4am-node17 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexcoord-64b67b845j6sgz 1/1 Running 0 4h6m 10.104.23.237 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-5v67d 1/1 Running 0 4h6m 10.104.14.197 4am-node18 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-jltrr 1/1 Running 0 4h6m 10.104.34.37 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-svmnb 1/1 Running 0 4h6m 10.104.24.101 4am-node29 <none> <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-vrvl8 1/1 Running 0 4h6m 10.104.4.153 4am-node11 <none> <none>
inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8 1/1 Running 1 (4h1m ago) 4h6m 10.104.5.219 4am-node12 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm 1/1 Running 1 (4h1m ago) 4h6m 10.104.14.196 4am-node18 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74 1/1 Running 2 (100m ago) 4h6m 10.104.25.51 4am-node30 <none> <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z 1/1 Running 0 4h6m 10.104.5.220 4am-node12 <none> <none>
inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw 1/1 Running 1 (4h1m ago) 4h6m 10.104.9.82 4am-node14 <none> <none>
inverted-corn-136000-8-75-4542-minio-0 1/1 Running 0 4h6m 10.104.34.52 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-minio-1 1/1 Running 0 4h6m 10.104.29.226 4am-node35 <none> <none>
inverted-corn-136000-8-75-4542-minio-2 1/1 Running 0 4h6m 10.104.26.33 4am-node32 <none> <none>
inverted-corn-136000-8-75-4542-minio-3 1/1 Running 0 4h6m 10.104.31.131 4am-node34 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-0 1/1 Running 0 4h6m 10.104.28.173 4am-node33 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-1 1/1 Running 0 4h6m 10.104.26.34 4am-node32 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-2 1/1 Running 0 4h6m 10.104.23.9 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-init-x5qrg 0/1 Completed 0 4h6m 10.104.34.35 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-broker-0 1/1 Running 0 4h6m 10.104.30.110 4am-node38 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-proxy-0 1/1 Running 0 4h6m 10.104.23.241 4am-node27 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-pulsar-init-bfggc 0/1 Completed 0 4h6m 10.104.34.34 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-recovery-0 1/1 Running 0 4h6m 10.104.9.83 4am-node14 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-0 1/1 Running 0 4h6m 10.104.34.53 4am-node37 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-1 1/1 Running 0 4h4m 10.104.19.176 4am-node28 <none> <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-2 1/1 Running 0 4h3m 10.104.29.228 4am-node35 <none> <none>
{pod=~"inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74"}
GC"=292] ["new GOGC"=200] [gc-pause=69.319µs] [gc-pause-end=1709141449374069875]
2024-02-29 01:30:49.379 [2024/02/28 17:30:49.379 +00:00] [DEBUG] [segments/collection.go:188] ["collection ref decrement"] [collectionID=448039877626298924] [refCount=234]
2024-02-29 01:30:49.380 [2024/02/28 17:30:49.380 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Server #5] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transport is closing\""]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=2417463163909405775] [error="etcdserver: requested lease not found"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [ERROR] [querynodev2/server.go:171] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=8] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:171"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [querynodev2/server.go:415] ["Query node stop..."]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [querynodev2/server.go:418] ["session fail to go stopping state"] [error="this session has disconnected"] [errorVerbose="this session has disconnected\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/util/sessionutil.(*Session).GoingStop\n | \t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil/session_util.go:661\n | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Stop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:416\n | sync.(*Once).doSlow\n | \t/usr/local/go/src/sync/once.go:74\n | sync.(*Once).Do\n | \t/usr/local/go/src/sync/once.go:65\n | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Stop\n | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:414\n | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:172\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) this session has disconnected\nError types: (1) *withstack.withStack (2) *errutil.leafError"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:122] ["receiveChan closed, processing remaining request"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:129] ["all task put into exeChan, schedule worker exit"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:217] ["scheduler execChan closed, worker exit"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [DEBUG] [pipeline/stream_pipeline.go:56] ["stream pipeline input closed"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:210] ["stop working"] [pchannel=by-dev-rootcoord-dml_0] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:164] ["closed target"] [vchannel=by-dev-rootcoord-dml_0_448039877626298924v0] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgstream/mq_msgstream.go:216] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
2024-02-29 01:30:49.659 [2024/02/28 17:30:49.659 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.659 [2024/02/28 17:30:49.659 +00:00] [INFO]
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74.txt
inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8.txt inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z.txt inverted-corn-136000-8-75-4542-etcd-.*.txt inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw.txt inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8.txt inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm.txt inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8.txt
client pod name: inverted-corn-1709136000-127259320 client log: client.log.zip
Expected Behavior
No response
Steps To Reproduce
concurrent test and calculation of RT and QPS
:purpose: `vector: memory and disk index`
verify concurrent DML & DQL scenario which has 4 float_vector fields & 16 scalar fields
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'float_vector_1': 128dim,
'float_vector_2': 200dim,
'float_vector_3': 200dim,
'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1', 'bool_1',
'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
2. build indexes:
HNSW: 'float_vector'
DIAKANN_IP: 'float_vector_1'
HNSW: 'float_vector_2'
DIAKANN_L2: 'float_vector_3'
scalar_default_index: 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1'
scalar_INVERTED_index: 'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- load
- search
- hybrid_search
- query
Milvus Log
No response
Anything else?
test result:
[2024-02-28 20:08:38,030 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-02-28 20:08:38,031 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc delete 8653 0(0.00%) | 154 4 20575 21 | 0.80 0.00 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc flush 8590 0(0.00%) | 7883 169 70211 6300 | 0.80 0.00 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc hybrid_search 8608 6950(80.74%) | 298 3 33631 9 | 0.80 0.64 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc insert 8565 0(0.00%) | 478 51 24871 340 | 0.79 0.00 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc load 8660 27(0.31%) | 15331 10 300005 2900 | 0.80 0.00 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc query 8562 6991(81.65%) | 331 1 32205 7 | 0.79 0.65 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: grpc search 8580 6925(80.71%) | 449 81 34062 110 | 0.79 0.64 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: Aggregated 60218 20893(34.70%) | 3573 1 300005 210 | 5.58 1.93 (stats.py:789)
[2024-02-28 20:08:38,032 - INFO - fouram]: (stats.py:790)
[2024-02-28 20:08:38,036 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_8c16m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}},
'replicas': 2},
'indexNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}},
'replicas': 4},
'dataNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240228-095cdbed-amd64'}}},
'host': 'inverted-corn-136000-8-75-4542-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_hnsw_diskann_dml_dql_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'int8_1': {},
'int16_1': {},
'int32_1': {},
'int64_1': {},
'double_1': {},
'float_1': {},
'varchar_1': {},
'int8_2': {'index_type': 'INVERTED'},
'int16_2': {'index_type': 'INVERTED'},
'int32_2': {'index_type': 'INVERTED'},
'int64_2': {'index_type': 'INVERTED'},
'double_2': {'index_type': 'INVERTED'},
'float_2': {'index_type': 'INVERTED'},
'varchar_2': {'index_type': 'INVERTED'},
'bool_2': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_2': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}},
'float_vector_3': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 5000},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int8_1',
'int16_1',
'int32_1',
'int64_1',
'double_1',
'float_1',
'varchar_1',
'bool_1',
'int8_2',
'int16_2',
'int32_2',
'int64_2',
'double_2',
'float_2',
'varchar_2',
'bool_2'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 30,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 5000000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 300}},
{'type': 'search',
'weight': 1,
'params': {'nq': 1000,
'top_k': 1,
'search_param': {'ef': 64},
'expr': 'int64_1 '
'> '
'-1 '
'&& '
'id '
'> '
'-1',
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': ['*'],
'ignore_growing': False,
'group_by_field': None,
'timeout': 180,
'random_data': True}},
{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'ef': 1280},
'anns_field': 'float_vector',
'expr': 'int64_1 '
'< '
'100000 '
'&& '
'float_2 '
'> '
'10.0',
'top_k': 1000},
{'search_param': {'search_list': 30},
'anns_field': 'float_vector_1',
'expr': 'varchar_1 '
'like '
'"0%" '
'&& '
'bool_2 '
'== '
'True'},
{'search_param': {'ef': 1024},
'anns_field': 'float_vector_2',
'expr': 'int8_1 '
'< '
'64 '
'&& '
'bool_1 '
'== '
'False',
'top_k': 1009},
{'search_param': {'search_list': 40},
'anns_field': 'float_vector_3',
'expr': 'int8_2 '
'> '
'64 '
'|| '
'double_2 '
'> '
'1000000.0'}],
'rerank': {'RRFRanker': []},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 1,
'params': {'ids': None,
'expr': 'int64_1 '
'> '
'-1 '
'&& '
'int64_2 '
'> '
'-1 '
'&& ',
'output_fields': ['*'],
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 180,
'random_data': True,
'random_count': 20,
'random_range': [2500000.0,
5000000],
'field_name': 'id',
'field_type': 'int64'}}]},
'run_id': 2024022861921862,
'datetime': '2024-02-28 16:03:12.352658',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 936.4863,
'float_vector_1': {'RT': 828.8714},
'float_vector_2': {'RT': 214.7156},
'float_vector_3': {'RT': 135.2635},
'int8_1': {'RT': 0.5383},
'int16_1': {'RT': 0.5459},
'int32_1': {'RT': 0.54},
'int64_1': {'RT': 0.5391},
'double_1': {'RT': 0.5368},
'float_1': {'RT': 0.7412},
'varchar_1': {'RT': 0.7463},
'int8_2': {'RT': 0.639},
'int16_2': {'RT': 0.6818},
'int32_2': {'RT': 0.5278},
'int64_2': {'RT': 0.525},
'double_2': {'RT': 0.5441},
'float_2': {'RT': 0.5439},
'varchar_2': {'RT': 0.547},
'bool_2': {'RT': 0.631}},
'insert': {'total_time': 901.6265,
'VPS': 5545.5335,
'batch_time': 0.9016,
'batch': 5000},
'flush': {'RT': 3.5434},
'load': {'RT': 32.7634},
'Locust': {'Aggregated': {'Requests': 60218,
'Fails': 20893,
'RPS': 5.58,
'fail_s': 0.35,
'RT_max': 300005.24,
'RT_avg': 3573.59,
'TP50': 210.0,
'TP99': 50000.0},
'delete': {'Requests': 8653,
'Fails': 0,
'RPS': 0.8,
'fail_s': 0.0,
'RT_max': 20575.91,
'RT_avg': 154.5,
'TP50': 21,
'TP99': 1700.0},
'flush': {'Requests': 8590,
'Fails': 0,
'RPS': 0.8,
'fail_s': 0.0,
'RT_max': 70211.92,
'RT_avg': 7883.45,
'TP50': 6300.0,
'TP99': 44000.0},
'hybrid_search': {'Requests': 8608,
'Fails': 6950,
'RPS': 0.8,
'fail_s': 0.81,
'RT_max': 33631.02,
'RT_avg': 298.54,
'TP50': 9,
'TP99': 2500.0},
'insert': {'Requests': 8565,
'Fails': 0,
'RPS': 0.79,
'fail_s': 0.0,
'RT_max': 24871.13,
'RT_avg': 478.59,
'TP50': 340.0,
'TP99': 2300.0},
'load': {'Requests': 8660,
'Fails': 27,
'RPS': 0.8,
'fail_s': 0.0,
'RT_max': 300005.24,
'RT_avg': 15331.16,
'TP50': 2900.0,
'TP99': 242000.0},
'query': {'Requests': 8562,
'Fails': 6991,
'RPS': 0.79,
'fail_s': 0.82,
'RT_max': 32205.63,
'RT_avg': 331.84,
'TP50': 8,
'TP99': 3700.0},
'search': {'Requests': 8580,
'Fails': 6925,
'RPS': 0.79,
'fail_s': 0.81,
'RT_max': 34062.22,
'RT_avg': 449.97,
'TP50': 110.0,
'TP99': 3400.0}}}}}
Issue #30915 may be the same reason
dataNode panic: fail to allocate ID
argo task: inverted-corn-c79tm test case name: test_inverted_locust_varchar_dml_dql_cluster
server:
[2024-02-29 10:05:24,154 - INFO - fouram]: [Base] Deploy initial state:
I0229 03:25:11.753714 401 request.go:665] Waited for 1.174198326s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/node.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-c79tm-5-49-7343-etcd-0 1/1 Running 0 8m27s 10.104.24.237 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-etcd-1 1/1 Running 0 8m27s 10.104.16.15 4am-node21 <none> <none>
inverted-corn-c79tm-5-49-7343-etcd-2 1/1 Running 0 8m26s 10.104.30.216 4am-node38 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-datacoord-5c45566ddd-f295q 1/1 Running 0 8m27s 10.104.20.29 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv 1/1 Running 1 (3m56s ago) 8m27s 10.104.26.144 4am-node32 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-indexcoord-5455756f77nfj5t 1/1 Running 0 8m27s 10.104.23.168 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-indexnode-7c6cf7486b-dbnfs 1/1 Running 0 8m27s 10.104.20.31 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-proxy-7d854b565-pz4mz 1/1 Running 1 (3m56s ago) 8m27s 10.104.12.237 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-querycoord-c88bdbb74-b8tmt 1/1 Running 1 (3m57s ago) 8m27s 10.104.20.30 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-querynode-6bb9b6d87b-fhsz9 1/1 Running 0 8m27s 10.104.23.166 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-rootcoord-8445bc65d8-hbhpr 1/1 Running 1 (3m57s ago) 8m27s 10.104.23.167 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-0 1/1 Running 0 8m27s 10.104.26.146 4am-node32 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-1 1/1 Running 0 8m27s 10.104.24.238 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-2 1/1 Running 0 8m27s 10.104.29.75 4am-node35 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-3 1/1 Running 0 8m27s 10.104.21.137 4am-node24 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-0 1/1 Running 0 8m27s 10.104.34.176 4am-node37 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-1 1/1 Running 0 8m27s 10.104.25.242 4am-node30 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-2 1/1 Running 0 8m26s 10.104.24.244 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-init-zhxgz 0/1 Completed 0 8m27s 10.104.12.239 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-broker-0 1/1 Running 0 8m27s 10.104.15.211 4am-node20 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-proxy-0 1/1 Running 0 8m27s 10.104.5.162 4am-node12 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-pulsar-init-sv6xd 0/1 Completed 0 8m27s 10.104.12.238 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-recovery-0 1/1 Running 0 8m27s 10.104.12.236 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-0 1/1 Running 0 8m27s 10.104.24.236 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-1 1/1 Running 0 7m47s 10.104.34.203 4am-node37 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-2 1/1 Running 0 6m6s 10.104.16.55 4am-node21 <none> <none> (base.py:257)
[2024-02-29 10:05:24,155 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'STATUS|inverted-corn-c79tm-5-49-7343-milvus|inverted-corn-c79tm-5-49-7343-minio|inverted-corn-c79tm-5-49-7343-etcd|inverted-corn-c79tm-5-49-7343-pulsar|inverted-corn-c79tm-5-49-7343-kafka|inverted-corn-c79tm-5-49-7343-log|inverted-corn-c79tm-5-49-7343-tikv' (util_cmd.py:14)
[2024-02-29 10:05:33,788 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-c79tm-5-49-7343):
I0229 10:05:25.402578 511 request.go:665] Waited for 1.164400108s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/metrics.k8s.io/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-c79tm-5-49-7343-etcd-0 1/1 Running 0 6h48m 10.104.24.237 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-etcd-1 1/1 Running 0 6h48m 10.104.16.15 4am-node21 <none> <none>
inverted-corn-c79tm-5-49-7343-etcd-2 1/1 Running 0 6h48m 10.104.30.216 4am-node38 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-datacoord-5c45566ddd-f295q 1/1 Running 0 6h48m 10.104.20.29 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv 1/1 Running 2 (58m ago) 6h48m 10.104.26.144 4am-node32 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-indexcoord-5455756f77nfj5t 1/1 Running 0 6h48m 10.104.23.168 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-indexnode-7c6cf7486b-dbnfs 1/1 Running 0 6h48m 10.104.20.31 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-proxy-7d854b565-pz4mz 1/1 Running 1 (6h44m ago) 6h48m 10.104.12.237 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-querycoord-c88bdbb74-b8tmt 1/1 Running 1 (6h44m ago) 6h48m 10.104.20.30 4am-node22 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-querynode-6bb9b6d87b-fhsz9 1/1 Running 0 6h48m 10.104.23.166 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-rootcoord-8445bc65d8-hbhpr 1/1 Running 1 (6h44m ago) 6h48m 10.104.23.167 4am-node27 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-0 1/1 Running 0 6h48m 10.104.26.146 4am-node32 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-1 1/1 Running 0 6h48m 10.104.24.238 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-2 1/1 Running 0 6h48m 10.104.29.75 4am-node35 <none> <none>
inverted-corn-c79tm-5-49-7343-minio-3 1/1 Running 0 6h48m 10.104.21.137 4am-node24 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-0 1/1 Running 0 6h48m 10.104.34.176 4am-node37 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-1 1/1 Running 0 6h48m 10.104.25.242 4am-node30 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-2 1/1 Running 0 6h48m 10.104.24.244 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-init-zhxgz 0/1 Completed 0 6h48m 10.104.12.239 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-broker-0 1/1 Running 0 6h48m 10.104.15.211 4am-node20 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-proxy-0 1/1 Running 0 6h48m 10.104.5.162 4am-node12 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-pulsar-init-sv6xd 0/1 Completed 0 6h48m 10.104.12.238 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-recovery-0 1/1 Running 0 6h48m 10.104.12.236 4am-node17 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-0 1/1 Running 0 6h48m 10.104.24.236 4am-node29 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-1 1/1 Running 0 6h48m 10.104.34.203 4am-node37 <none> <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-2 1/1 Running 0 6h46m 10.104.16.55 4am-node21 <none> <none>
inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv.panic.txt
client pod name: inverted-corn-c79tm-505297513 client log: client.log
test steps:
concurrent test and calculation of RT and QPS
:purpose: `varchar: different max_length`
verify concurrent DML & DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index
:test steps:
1. create collection with fields:
'float_vector': 3dim,
'varchar_1': max_length=256, varchar_filled=True
'varchar_2': max_length=32768, varchar_filled=True
'varchar_3': max_length=65535, varchar_filled=True
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'varchar_1', 'varchar_2', 'varchar_3'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- load
- search
- hybrid_search
- query
test result:
[2024-02-29 10:05:15,361 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-02-29 10:05:15,361 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc delete 2619 1199(45.78%) | 6356 6 18595 6100 | 0.73 0.33 (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc flush 2654 0(0.00%) | 20717 3729 332757 15000 | 0.74 0.00 (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc hybrid_search 2615 0(0.00%) | 6815 2152 12416 6600 | 0.73 0.00 (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc insert 2619 1240(47.35%) | 6752 14 24199 6400 | 0.73 0.34 (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc load 2643 2(0.08%) | 13294 8 38653 13000 | 0.73 0.00 (stats.py:789)
[2024-02-29 10:05:15,361 - INFO - fouram]: grpc query 2678 0(0.00%) | 8988 5 27468 8900 | 0.74 0.00 (stats.py:789)
[2024-02-29 10:05:15,362 - INFO - fouram]: grpc search 2642 0(0.00%) | 4989 2169 9200 5100 | 0.73 0.00 (stats.py:789)
[2024-02-29 10:05:15,362 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-29 10:05:15,362 - INFO - fouram]: Aggregated 18470 2441(13.22%) | 9720 5 332757 7600 | 5.13 0.68 (stats.py:789)
[2024-02-29 10:05:15,362 - INFO - fouram]: (stats.py:790)
[2024-02-29 10:05:15,364 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c4m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '8',
'memory': '32Gi'},
'requests': {'cpu': '8',
'memory': '32Gi'}},
'replicas': 1},
'indexNode': {'resources': {'limits': {'cpu': '4.0',
'memory': '16Gi'},
'requests': {'cpu': '3.0',
'memory': '9Gi'}},
'replicas': 1},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '4Gi'},
'requests': {'cpu': '2.0',
'memory': '3Gi'}}},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240229-50a78b68-amd64'}}},
'host': 'inverted-corn-c79tm-5-49-7343-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_varchar_dml_dql_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 3,
'scalars_index': {'varchar_1': {'index_type': 'INVERTED'},
'varchar_2': {'index_type': 'INVERTED'},
'varchar_3': {'index_type': 'INVERTED'}},
'scalars_params': {'varchar_1': {'params': {'max_length': 256},
'other_params': {'varchar_filled': True}},
'varchar_2': {'params': {'max_length': 32768},
'other_params': {'varchar_filled': True}},
'varchar_3': {'params': {'max_length': 65535},
'other_params': {'varchar_filled': True}}},
'dataset_name': 'local',
'dataset_size': 300000,
'ni_per': 50},
'collection_params': {'other_fields': ['varchar_1',
'varchar_2',
'varchar_3'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 50,
'during_time': '1h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 30,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 300000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 10,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 600}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'search',
'weight': 1,
'params': {'nq': 1000,
'top_k': 1,
'search_param': {'nprobe': 32},
'expr': 'varchar_1 '
'like '
'"a%" '
'&& '
'varchar_2 '
'like '
'"A%" '
'&& '
'varchar_3 '
'like '
'"0%" '
'&& '
'id '
'> 0',
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 60,
'random_data': True}},
{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'nprobe': 16},
'anns_field': 'float_vector',
'expr': 'varchar_1 '
'like '
'"0%"',
'top_k': 2000},
{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'expr': 'varchar_2 '
'like '
'"9%"'}],
'rerank': {'WeightedRanker': [0.5,
0.5]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 1,
'params': {'ids': None,
'expr': 'varchar_3 '
'like '
'"a%" '
'&& ',
'output_fields': ['*'],
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 60,
'random_data': True,
'random_count': 20,
'random_range': [0,
150000.0],
'field_name': 'id',
'field_type': 'int64'}}]},
'run_id': 2024022966095562,
'datetime': '2024-02-29 03:16:49.427461',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 2674.4206,
'varchar_1': {'RT': 2423.3369},
'varchar_2': {'RT': 2770.9254},
'varchar_3': {'RT': 2398.6752}},
'insert': {'total_time': 802.7125,
'VPS': 373.7328,
'batch_time': 0.1338,
'batch': 50},
'flush': {'RT': 3.0556},
'load': {'RT': 67.3992},
'Locust': {'Aggregated': {'Requests': 18470,
'Fails': 2441,
'RPS': 5.13,
'fail_s': 0.13,
'RT_max': 332757.81,
'RT_avg': 9720.22,
'TP50': 7600.0,
'TP99': 27000.0},
'delete': {'Requests': 2619,
'Fails': 1199,
'RPS': 0.73,
'fail_s': 0.46,
'RT_max': 18595.15,
'RT_avg': 6356.37,
'TP50': 6100.0,
'TP99': 15000.0},
'flush': {'Requests': 2654,
'Fails': 0,
'RPS': 0.74,
'fail_s': 0.0,
'RT_max': 332757.81,
'RT_avg': 20717.97,
'TP50': 15000.0,
'TP99': 308000.0},
'hybrid_search': {'Requests': 2615,
'Fails': 0,
'RPS': 0.73,
'fail_s': 0.0,
'RT_max': 12416.05,
'RT_avg': 6815.72,
'TP50': 6600.0,
'TP99': 11000.0},
'insert': {'Requests': 2619,
'Fails': 1240,
'RPS': 0.73,
'fail_s': 0.47,
'RT_max': 24199.24,
'RT_avg': 6752.85,
'TP50': 6400.0,
'TP99': 15000.0},
'load': {'Requests': 2643,
'Fails': 2,
'RPS': 0.73,
'fail_s': 0.0,
'RT_max': 38653.31,
'RT_avg': 13294.36,
'TP50': 13000.0,
'TP99': 27000.0},
'query': {'Requests': 2678,
'Fails': 0,
'RPS': 0.74,
'fail_s': 0.0,
'RT_max': 27468.3,
'RT_avg': 8988.8,
'TP50': 8900.0,
'TP99': 18000.0},
'search': {'Requests': 2642,
'Fails': 0,
'RPS': 0.73,
'fail_s': 0.0,
'RT_max': 9200.37,
'RT_avg': 4989.36,
'TP50': 5100.0,
'TP99': 7800.0}}}}}
Recurrent
argo task: inverted-corn-1709395200 test case name: test_inverted_locust_partition_key_dml_standalone
server:
[2024-03-02 19:30:39,854 - INFO - fouram]: [Base] Deploy initial state:
I0302 16:07:20.142749 421 request.go:665] Waited for 1.169485874s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/authorization.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-195200-2-25-7555-etcd-0 1/1 Running 0 5m6s 10.104.27.102 4am-node31 <none> <none>
inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77 1/1 Running 1 (113s ago) 5m6s 10.104.25.22 4am-node30 <none> <none>
inverted-corn-195200-2-25-7555-minio-65dc6bf765-k258b 1/1 Running 0 5m6s 10.104.27.101 4am-node31 <none> <none> (base.py:257)
[2024-03-02 19:30:39,854 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'STATUS|inverted-corn-195200-2-25-7555-milvus|inverted-corn-195200-2-25-7555-minio|inverted-corn-195200-2-25-7555-etcd|inverted-corn-195200-2-25-7555-pulsar|inverted-corn-195200-2-25-7555-kafka|inverted-corn-195200-2-25-7555-log|inverted-corn-195200-2-25-7555-tikv' (util_cmd.py:14)
[2024-03-02 19:30:50,072 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-195200-2-25-7555):
I0302 19:30:41.119839 538 request.go:665] Waited for 1.151439776s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/apiextensions.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-195200-2-25-7555-etcd-0 1/1 Running 0 3h28m 10.104.27.102 4am-node31 <none> <none>
inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77 1/1 Running 4 (132m ago) 3h28m 10.104.25.22 4am-node30 <none> <none>
inverted-corn-195200-2-25-7555-minio-65dc6bf765-k258b 1/1 Running 0 3h28m 10.104.27.101 4am-node31 <none> <none>
inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77.log
client pod name: inverted-corn-1709395200-4115883251
client logs:
client.log
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition_key: scalar enable partition_key(num_partitions=128)`
verify concurrent DML scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1': is_partition_key
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- release
test result:
[2024-03-02 19:30:16,988 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-02 19:30:16,990 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-02 19:30:16,990 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-02 19:30:16,990 - INFO - fouram]: grpc delete 8934 0(0.00%) | 16 1 357 4 | 0.83 0.00 (stats.py:789)
[2024-03-02 19:30:16,990 - INFO - fouram]: grpc flush 8795 20(0.23%) | 19229 509 180789 14000 | 0.81 0.00 (stats.py:789)
[2024-03-02 19:30:16,991 - INFO - fouram]: grpc insert 8978 0(0.00%) | 5149 23 136817 3800 | 0.83 0.00 (stats.py:789)
[2024-03-02 19:30:16,991 - INFO - fouram]: grpc release 8978 0(0.00%) | 15 1 814 3 | 0.83 0.00 (stats.py:789)
[2024-03-02 19:30:16,992 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-02 19:30:16,992 - INFO - fouram]: Aggregated 35685 20(0.06%) | 6042 1 180789 71 | 3.31 0.00 (stats.py:789)
[2024-03-02 19:30:16,992 - INFO - fouram]: (stats.py:790)
[2024-03-02 19:30:16,994 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_8c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240302-d98a5e44-amd64'}}},
'host': 'inverted-corn-195200-2-25-7555-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2,
'num_partitions': 128},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 180,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 0}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'release',
'weight': 1,
'params': {'timeout': 30}}]},
'run_id': 2024030253418613,
'datetime': '2024-03-02 16:02:21.467025',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 920.1162,
'id': {'RT': 1.0264},
'int64_1': {'RT': 1.0201}},
'insert': {'total_time': 357.4444,
'VPS': 13988.1895,
'batch_time': 3.5744,
'batch': 50000},
'flush': {'RT': 12.9492},
'load': {'RT': 9.7292},
'Locust': {'Aggregated': {'Requests': 35685,
'Fails': 20,
'RPS': 3.31,
'fail_s': 0.0,
'RT_max': 180789.72,
'RT_avg': 6042.71,
'TP50': 71,
'TP99': 56000.0},
'delete': {'Requests': 8934,
'Fails': 0,
'RPS': 0.83,
'fail_s': 0.0,
'RT_max': 357.07,
'RT_avg': 16.47,
'TP50': 4,
'TP99': 110.0},
'flush': {'Requests': 8795,
'Fails': 20,
'RPS': 0.81,
'fail_s': 0.0,
'RT_max': 180789.72,
'RT_avg': 19229.44,
'TP50': 14000.0,
'TP99': 76000.0},
'insert': {'Requests': 8978,
'Fails': 0,
'RPS': 0.83,
'fail_s': 0.0,
'RT_max': 136817.72,
'RT_avg': 5149.17,
'TP50': 3800.0,
'TP99': 25000.0},
'release': {'Requests': 8978,
'Fails': 0,
'RPS': 0.83,
'fail_s': 0.0,
'RT_max': 814.22,
'RT_avg': 15.02,
'TP50': 3,
'TP99': 110.0}}}}}
Same error, different scene
argo task: multi-vector-corn-1709560800 test case name:test_hybrid_search_locust_shard1_float_dql_hnsw_standalone image:master-20240304-52540fec-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-1709560800-51-etcd-0 1/1 Running 0 3m1s 10.104.16.103 4am-node21 <none> <none>
multi-vector-corn-1709560800-51-milvus-standalone-58ff988fwwlgw 1/1 Running 0 3m1s 10.104.26.113 4am-node32 <none> <none>
multi-vector-corn-1709560800-51-minio-6d6d88568d-lfk7n 1/1 Running 0 3m1s 10.104.26.112 4am-node32 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-1709560800-51-etcd-0 1/1 Running 0 12h 10.104.16.103 4am-node21 <none> <none>
multi-vector-corn-1709560800-51-milvus-standalone-58ff988fwwlgw 1/1 Running 6 (10h ago) 12h 10.104.26.113 4am-node32 <none> <none>
multi-vector-corn-1709560800-51-minio-6d6d88568d-lfk7n 1/1 Running 0 12h 10.104.26.112 4am-node32 <none> <none>
client pod name:multi-vector-corn-1709560800-1714564187 client logs: client.log
test step:
concurrent test and calculation of RT and QPS
:purpose: `shard_num=1, float_vector DQL`
verify concurrent DQL scenario which has 4 float_vector fields(HNSW) and 60 scalar fields
:test steps:
1. create collection with fields:
'float_vector': 32768dim,
'float_vector_1': 32768dim,
'float_vector_2': 32768dim,
'float_vector_3': 32768dim,
all scalar fields: varchar max_length=10, array max_capacity=7
2. build indexes:
HNSW: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
default_scalar_index: 'int64_1'
INVERTED: 'id', 'bool_3'
3. insert 100k data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- hybrid_search
test result:
'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_16c64m',
'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240304-52540fec-amd64'}}},
'host': 'multi-vector-corn-1709560800-51-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_hybrid_search_locust_shard1_float_dql_hnsw_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 32768,
'max_length': 10,
'scalars_index': {'int64_1': {},
'id': {'index_type': 'INVERTED'},
'bool_3': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'}},
'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
'array_int16_1': {'params': {'max_capacity': 7}},
'array_int32_1': {'params': {'max_capacity': 7}},
'array_int64_1': {'params': {'max_capacity': 7}},
'array_double_1': {'params': {'max_capacity': 7}},
'array_float_1': {'params': {'max_capacity': 7}},
'array_varchar_1': {'params': {'max_capacity': 7}},
'array_bool_1': {'params': {'max_capacity': 7}},
'array_int8_2': {'params': {'max_capacity': 7}},
'array_int16_2': {'params': {'max_capacity': 7}},
'array_int32_2': {'params': {'max_capacity': 7}},
'array_int64_2': {'params': {'max_capacity': 7}},
'array_double_2': {'params': {'max_capacity': 7}},
'array_float_2': {'params': {'max_capacity': 7}},
'array_varchar_2': {'params': {'max_capacity': 7}},
'array_bool_2': {'params': {'max_capacity': 7}},
'array_int8_3': {'params': {'max_capacity': 7}},
'array_int16_3': {'params': {'max_capacity': 7}},
'array_int32_3': {'params': {'max_capacity': 7}},
'array_int64_3': {'params': {'max_capacity': 7}},
'array_double_3': {'params': {'max_capacity': 7}},
'array_float_3': {'params': {'max_capacity': 7}},
'array_varchar_3': {'params': {'max_capacity': 7}},
'array_bool_3': {'params': {'max_capacity': 7}}},
'dataset_name': 'local',
'dataset_size': 100000,
'ni_per': 100},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int8_1',
'int16_1',
'int32_1',
'int64_1',
'double_1',
'float_1',
'varchar_1',
'bool_1',
'json_1',
'array_int8_1',
'array_int16_1',
'array_int32_1',
'array_int64_1',
'array_double_1',
'array_float_1',
'array_varchar_1',
'array_bool_1',
'int8_2',
'int16_2',
'int32_2',
'int64_2',
'double_2',
'float_2',
'varchar_2',
'bool_2',
'json_2',
'array_int8_2',
'array_int16_2',
'array_int32_2',
'array_int64_2',
'array_double_2',
'array_float_2',
'array_varchar_2',
'array_bool_2',
'int8_3',
'int16_3',
'int32_3',
'int64_3',
'double_3',
'float_3',
'varchar_3',
'bool_3',
'json_3',
'array_int8_3',
'array_int16_3',
'array_int32_3',
'array_int64_3',
'array_double_3',
'array_float_3',
'array_varchar_3',
'array_bool_3',
'varchar_tail_1',
'varchar_tail_2',
'varchar_tail_3',
'varchar_tail_4',
'varchar_tail_5',
'varchar_tail_6',
'varchar_tail_7',
'varchar_tail_8'],
'shards_num': 1},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 1,
'during_time': '1h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 100,
'reqs': [{'search_param': {'ef': 128},
'anns_field': 'float_vector',
'expr': 'id '
'> '
'10000',
'top_k': 10},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_1',
'expr': 'int64_1 '
'<= '
'90000',
'top_k': 50},
{'search_param': {'ef': 1024},
'anns_field': 'float_vector_2',
'expr': 'array_length(array_int8_2) '
'== '
'7',
'top_k': 1000},
{'search_param': {'ef': 20000},
'anns_field': 'float_vector_3',
'expr': 'bool_3 '
'== '
'True',
'top_k': 16384}],
'rerank': {'RRFRanker': []},
'output_fields': ['float_vector'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}}]},
'run_id': 2024030409416393,
'datetime': '2024-03-04 14:02:21.023607',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 174.534,
'float_vector_1': {'RT': 30.9736},
'float_vector_2': {'RT': 7.6267},
'float_vector_3': {'RT': 8.1377},
'int64_1': {'RT': 1.0257},
'id': {'RT': 0.5196},
'bool_3': {'RT': 0.5178}},
'insert': {'total_time': 3698.0592,
'VPS': 27.0412,
'batch_time': 3.6981,
'batch': 100},
'flush': {'RT': 2.5311},
'load': {'RT': 66.3822},
'Locust': {'Aggregated': {'Requests': 3015,
'Fails': 3,
'RPS': 0.84,
'fail_s': 0.0,
'RT_max': 62713.68,
'RT_avg': 927.07,
'TP50': 840.0,
'TP99': 1100.0},
'hybrid_search': {'Requests': 3015,
'Fails': 3,
'RPS': 0.84,
'fail_s': 0.0,
'RT_max': 62713.68,
'RT_avg': 927.07,
'TP50': 840.0,
'TP99': 1100.0}}}}}
@wangting0128 it seems on all your case there is some node crash. Did you check the possible reason why node crash?
@wangting0128 it seems on all your case there is some node crash. Did you check the possible reason why node crash?
I have checked the reason why the node was restarted. From the log, I can see that the node restarted due to the disconnection between the node and etcd.
2024-03-04 14:32:44.505(no unique labels)[2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"]2024-03-04 14:32:44.505(no unique labels)[2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"]2024-03-04 14:32:44.653(no unique labels)[2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"] | | | | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"] | | | | 2024-03-04 14:32:44.653 | (no unique labels) | [2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"] | | | | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] |
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"] |
| | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"] |
| | 2024-03-04 14:32:44.653 | (no unique labels) | [2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"] |
| | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] |
2024-03-04 14:32:44.665stdout[2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] | | | 2024-03-04 14:32:44.665 | stdout | [2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] |
| | 2024-03-04 14:32:44.665 | stdout | [2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] |
2024-03-04 14:33:04.704(no unique labels)[2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"]2024-03-04 14:33:04.704(no unique labels)[2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3]2024-03-04 14:33:04.720(no unique labels)[2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"]2024-03-04 14:33:04.723(no unique labels)[2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource]2024-03-04 14:33:04.724(no unique labels)[2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] | | | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"] | | | | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] | | | | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] | | | | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"] | | | | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3] | | | | 2024-03-04 14:33:04.720 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] | | | | 2024-03-04 14:33:04.723 | (no unique labels) | [2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource] | | | | 2024-03-04 14:33:04.724 | (no unique labels) | [2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
| | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"] |
| | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] |
| | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] |
| | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"] |
| | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3] |
| | 2024-03-04 14:33:04.720 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] |
| | 2024-03-04 14:33:04.723 | (no unique labels) | [2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource] |
| | 2024-03-04 14:33:04.724 | (no unique labels) | [2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
But there is no abnormality found in etcd monitoring and logs.
more like cpu full on node side.
might be throttleed by K8s?
more like cpu full on node side.
might be throttleed by K8s?
From the monitoring point of view, the CPU and memory usage of the pod are not too high before and after the node restart time.
'standalone': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}}}
it's already 16 and you required 16
more like cpu full on node side.
might be throttleed by K8s?
The node where the pod is located has no abnormal monitoring indicators at the pod restart time point.
it's already 16 and you required 16
pod restart at 14:33, the CPU usage at that time is about 2.5C
it's already 16 and you required 16
pod restart at 14:33, the CPU usage at that time is about 2.5C
pod restart time:
- 2024-03-04 14:08:41.520 | stderr | Welcome to use Milvus!
- 2024-03-04 14:09:51.702 | stderr | Welcome to use Milvus!
- 2024-03-04 14:27:36.993 | stderr | Welcome to use Milvus!
- 2024-03-04 14:33:04.703 | stderr | Welcome to use Milvus!
- 2024-03-04 14:45:25.132 | stderr | Welcome to use Milvus!
Proxy disconnected from etcd
please help to check if it is the same problem @longjiquan
argo task: inverted-corn-1711036800 test case name: test_inverted_locust_partition_key_dml_standalone
server:
[2024-03-21 19:24:33,265 - INFO - fouram]: [Base] Deploy initial state:
I0321 16:08:44.054900 406 request.go:665] Waited for 1.167310868s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/autoscaling/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-136800-2-57-7793-etcd-0 1/1 Running 0 2m45s 10.104.30.115 4am-node38 <none> <none>
inverted-corn-136800-2-57-7793-milvus-standalone-6778dd748s2tm9 1/1 Running 0 2m45s 10.104.25.106 4am-node30 <none> <none>
inverted-corn-136800-2-57-7793-minio-cf8955d87-b75ss 1/1 Running 0 2m45s 10.104.30.114 4am-node38 <none> <none> (base.py:257)
[2024-03-21 19:24:33,265 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|inverted-corn-136800-2-57-7793-milvus|inverted-corn-136800-2-57-7793-minio|inverted-corn-136800-2-57-7793-etcd|inverted-corn-136800-2-57-7793-pulsar|inverted-corn-136800-2-57-7793-zookeeper|inverted-corn-136800-2-57-7793-kafka|inverted-corn-136800-2-57-7793-log|inverted-corn-136800-2-57-7793-tikv' (util_cmd.py:14)
[2024-03-21 19:24:43,534 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-136800-2-57-7793):
I0321 19:24:34.592913 506 request.go:665] Waited for 1.16829448s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/discovery.k8s.io/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-136800-2-57-7793-etcd-0 1/1 Running 0 3h18m 10.104.30.115 4am-node38 <none> <none>
inverted-corn-136800-2-57-7793-milvus-standalone-6778dd748s2tm9 1/1 Running 1 (94m ago) 3h18m 10.104.25.106 4am-node30 <none> <none>
inverted-corn-136800-2-57-7793-minio-cf8955d87-b75ss 1/1 Running 0 3h18m 10.104.30.114 4am-node38 <none> <none>
client pod name: inverted-corn-1711036800-2185322252 client log: Error reporting time range 2024-03-21 17:49:49,202 ~ 2024-03-21 17:53:14,309 client.log
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition_key: scalar enable partition_key(num_partitions=128)`
verify concurrent DML scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1': is_partition_key
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- release
test result:
[2024-03-21 19:24:08,772 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-21 19:24:08,772 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-21 19:24:08,772 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-21 19:24:08,772 - INFO - fouram]: grpc delete 28833 0(0.00%) | 53 1 513 35 | 2.67 0.00 (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: grpc flush 28469 20(0.07%) | 7183 54 273026 6400 | 2.64 0.00 (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: grpc insert 28700 0(0.00%) | 287 21 12349 110 | 2.66 0.00 (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: grpc release 28477 0(0.00%) | 52 0 648 34 | 2.64 0.00 (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: Aggregated 114479 20(0.02%) | 1885 0 273026 90 | 10.60 0.00 (stats.py:789)
[2024-03-21 19:24:08,773 - INFO - fouram]: (stats.py:790)
[2024-03-21 19:24:08,776 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_8c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240321-47868e9d-amd64'}}},
'host': 'inverted-corn-136800-2-57-7793-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2,
'num_partitions': 128},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 180,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 0}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'release',
'weight': 1,
'params': {'timeout': 30}}]},
'run_id': 2024032171652783,
'datetime': '2024-03-21 16:06:05.342750',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 628.9663,
'id': {'RT': 1.0116},
'int64_1': {'RT': 1.0109}},
'insert': {'total_time': 167.189,
'VPS': 29906.2737,
'batch_time': 1.6719,
'batch': 50000},
'flush': {'RT': 9.6348},
'load': {'RT': 7.5956},
'Locust': {'Aggregated': {'Requests': 114479,
'Fails': 20,
'RPS': 10.6,
'fail_s': 0.0,
'RT_max': 273026.89,
'RT_avg': 1885.18,
'TP50': 90,
'TP99': 12000.0},
'delete': {'Requests': 28833,
'Fails': 0,
'RPS': 2.67,
'fail_s': 0.0,
'RT_max': 513.1,
'RT_avg': 53.43,
'TP50': 35,
'TP99': 260.0},
'flush': {'Requests': 28469,
'Fails': 20,
'RPS': 2.64,
'fail_s': 0.0,
'RT_max': 273026.89,
'RT_avg': 7183.95,
'TP50': 6400.0,
'TP99': 20000.0},
'insert': {'Requests': 28700,
'Fails': 0,
'RPS': 2.66,
'fail_s': 0.0,
'RT_max': 12349.54,
'RT_avg': 287.56,
'TP50': 110.0,
'TP99': 4200.0},
'release': {'Requests': 28477,
'Fails': 0,
'RPS': 2.64,
'fail_s': 0.0,
'RT_max': 648.41,
'RT_avg': 52.66,
'TP50': 34,
'TP99': 250.0}}}}}
Data Node disconnected from etcd
argo task: inverted-corn-1711123200 test case name: test_inverted_locust_partitions_dml_dql_standalone
server:
[2024-03-22 19:18:59,549 - INFO - fouram]: [Base] Deploy initial state:
I0322 16:11:02.401993 419 request.go:665] Waited for 1.16397117s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/apiregistration.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-123200-3-30-1407-etcd-0 1/1 Running 0 2m27s 10.104.18.60 4am-node25 <none> <none>
inverted-corn-123200-3-30-1407-milvus-standalone-75548cd9frtbjr 1/1 Running 0 2m27s 10.104.26.117 4am-node32 <none> <none>
inverted-corn-123200-3-30-1407-minio-787555bd4d-cwxwr 1/1 Running 0 2m27s 10.104.15.217 4am-node20 <none> <none> (base.py:257)
[2024-03-22 19:18:59,549 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|inverted-corn-123200-3-30-1407-milvus|inverted-corn-123200-3-30-1407-minio|inverted-corn-123200-3-30-1407-etcd|inverted-corn-123200-3-30-1407-pulsar|inverted-corn-123200-3-30-1407-zookeeper|inverted-corn-123200-3-30-1407-kafka|inverted-corn-123200-3-30-1407-log|inverted-corn-123200-3-30-1407-tikv' (util_cmd.py:14)
[2024-03-22 19:19:09,580 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-123200-3-30-1407):
I0322 19:19:00.807332 550 request.go:665] Waited for 1.172468491s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/storage.k8s.io/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-123200-3-30-1407-etcd-0 1/1 Running 0 3h10m 10.104.18.60 4am-node25 <none> <none>
inverted-corn-123200-3-30-1407-milvus-standalone-75548cd9frtbjr 1/1 Running 1 (2m40s ago) 3h10m 10.104.26.117 4am-node32 <none> <none>
inverted-corn-123200-3-30-1407-minio-787555bd4d-cwxwr 1/1 Running 0 3h10m 10.104.15.217 4am-node20 <none> <none>
client pod name: inverted-corn-1711123200-249811846 client log: clien.log
Error reporting time range: 2024-03-22 19:15:17,679 ~ 2024-03-22 19:18:55,522
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition: collection has many partitions`
verify concurrent DML & DQL scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and collection has 10 partitions
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1'
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data to 10 partitions
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- load
- search
- hybrid_search
- query
test result:
[2024-03-22 19:17:18,025 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-22 19:17:18,025 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc delete 8929 4(0.04%) | 635 1 54603 100 | 0.83 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc flush 8968 0(0.00%) | 6977 236 28195 6300 | 0.83 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc hybrid_search 9001 8(0.09%) | 6247 266 61033 5800 | 0.84 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc insert 9013 1(0.01%) | 771 15 58594 190 | 0.84 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc load 8850 4(0.05%) | 1273 3 60003 380 | 0.82 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc query 9138 0(0.00%) | 4086 96 59228 3600 | 0.85 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: grpc search 9023 0(0.00%) | 3684 573 14209 3300 | 0.84 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: Aggregated 62922 17(0.03%) | 3389 1 61033 2800 | 5.84 0.00 (stats.py:789)
[2024-03-22 19:17:18,026 - INFO - fouram]: (stats.py:790)
[2024-03-22 19:17:18,029 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_16c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
'memory': '16Gi'},
'requests': {'cpu': '9.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240322-99774548-amd64'}}},
'host': 'inverted-corn-123200-3-30-1407-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partitions_dml_dql_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'extra_partitions': {'partitions': 10,
'data_repeated': False},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 30,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 5000000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'search',
'weight': 1,
'params': {'nq': 1000,
'top_k': 10,
'search_param': {'nprobe': 16},
'expr': None,
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 180,
'random_data': True}},
{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'nprobe': 16},
'anns_field': 'float_vector',
'top_k': 2000},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector',
'expr': 'int64_1 '
'> '
'-1 '
'&& '
'id '
'> '
'-1'},
{'search_param': {'nprobe': 64},
'anns_field': 'float_vector',
'expr': 'id '
'> '
'10',
'top_k': 60}],
'rerank': {'WeightedRanker': [0.3,
0.4,
0.3]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 1,
'params': {'ids': None,
'expr': 'int64_1 '
'> '
'-1 '
'&&',
'output_fields': ['*'],
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 180,
'random_data': True,
'random_count': 20,
'random_range': [0,
1000000.0],
'field_name': 'id',
'field_type': 'int64'}}]},
'run_id': 2024032237213185,
'datetime': '2024-03-22 16:08:41.486026',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 99.931,
'id': {'RT': 1.0171},
'int64_1': {'RT': 1.011}},
'insert': {'total_time': 175.9617,
'VPS': 28679.9859,
'batch_time': 1.7596,
'batch': 50000.0},
'flush': {'RT': 3.1012},
'load': {'RT': 5.5419},
'Locust': {'Aggregated': {'Requests': 62922,
'Fails': 17,
'RPS': 5.84,
'fail_s': 0.0,
'RT_max': 61033.45,
'RT_avg': 3389.73,
'TP50': 2800.0,
'TP99': 14000.0},
'delete': {'Requests': 8929,
'Fails': 4,
'RPS': 0.83,
'fail_s': 0.0,
'RT_max': 54603.54,
'RT_avg': 635.08,
'TP50': 100.0,
'TP99': 4900.0},
'flush': {'Requests': 8968,
'Fails': 0,
'RPS': 0.83,
'fail_s': 0.0,
'RT_max': 28195.54,
'RT_avg': 6977.64,
'TP50': 6300.0,
'TP99': 18000.0},
'hybrid_search': {'Requests': 9001,
'Fails': 8,
'RPS': 0.84,
'fail_s': 0.0,
'RT_max': 61033.45,
'RT_avg': 6247.61,
'TP50': 5800.0,
'TP99': 14000.0},
'insert': {'Requests': 9013,
'Fails': 1,
'RPS': 0.84,
'fail_s': 0.0,
'RT_max': 58594.34,
'RT_avg': 771.37,
'TP50': 190.0,
'TP99': 5100.0},
'load': {'Requests': 8850,
'Fails': 4,
'RPS': 0.82,
'fail_s': 0.0,
'RT_max': 60003.05,
'RT_avg': 1273.14,
'TP50': 380.0,
'TP99': 8400.0},
'query': {'Requests': 9138,
'Fails': 0,
'RPS': 0.85,
'fail_s': 0.0,
'RT_max': 59228.6,
'RT_avg': 4086.94,
'TP50': 3600.0,
'TP99': 12000.0},
'search': {'Requests': 9023,
'Fails': 0,
'RPS': 0.84,
'fail_s': 0.0,
'RT_max': 14209.91,
'RT_avg': 3684.13,
'TP50': 3300.0,
'TP99': 9600.0}}}}}
Different scene,same error
argo task:multi-vector-corn-2vswm test case name:test_hybrid_search_locust_shard1_float_dql_diskann_standalone
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-2vswm-2s1-etcd-0 1/1 Running 0 26h 10.104.29.93 4am-node35 <none> <none>
multi-vector-corn-2vswm-2s1-milvus-standalone-6f744444f-bdnrp 1/1 Running 3 (11h ago) 26h 10.104.25.60 4am-node30 <none> <none>
multi-vector-corn-2vswm-2s1-minio-785d495c47-wp26r 1/1 Running 0 26h 10.104.29.85 4am-node35 <none> <none>
client pod name: multi-vector-corn-2vswm-1562047643 client log: client.log
get_index_state failed
from
[2024-03-25 19:24:58,016 - WARNING - fouram]: [93m[get_index_state] retry:4, cost: 0.27s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.255.95.86:19530: Failed to connect to remote host: Connection refused>[0m (decorators.py:100)
to
[2024-03-25 19:59:17,355 - WARNING - fouram]: [93m[get_index_state] retry:49, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.255.95.86:19530: Failed to connect to remote host: Connection refused>[0m (decorators.py:100)
test steps:
concurrent test and calculation of RT and QPS
:purpose: `shard_num=1, float_vector DQL`
verify concurrent DQL scenario which has 4 float_vector fields(DISKANN) and 60 scalar fields
:test steps:
1. create collection with fields:
'float_vector': 2048dim,
'float_vector_1': 2048dim,
'float_vector_2': 2048dim,
'float_vector_3': 2048dim,
all scalar fields: varchar max_length=10, array max_capacity=7
2. build indexes:
DISKANN: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
default_scalar_index: 'int64_1'
INVERTED: 'id', 'bool_3'
3. insert 100k data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- hybrid_search
test result:
[2024-03-25 23:25:08,505 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-25 23:25:08,505 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-25 23:25:08,505 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-25 23:25:08,505 - INFO - fouram]: grpc hybrid_search 1710 4(0.23%) | 41864 23354 60005 41000 | 0.48 0.00 (stats.py:789)
[2024-03-25 23:25:08,505 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-25 23:25:08,505 - INFO - fouram]: Aggregated 1710 4(0.23%) | 41864 23354 60005 41000 | 0.48 0.00 (stats.py:789)
[2024-03-25 23:25:08,506 - INFO - fouram]: (stats.py:790)
[2024-03-25 23:25:08,511 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_16c64m',
'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240325-6e0baa47-amd64'}}},
'host': 'multi-vector-corn-2vswm-2s1-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_hybrid_search_locust_shard1_float_dql_diskann_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 2048,
'max_length': 10,
'scalars_index': {'int64_1': {},
'id': {'index_type': 'INVERTED'},
'bool_3': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'}},
'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
'array_int16_1': {'params': {'max_capacity': 7}},
'array_int32_1': {'params': {'max_capacity': 7}},
'array_int64_1': {'params': {'max_capacity': 7}},
'array_double_1': {'params': {'max_capacity': 7}},
'array_float_1': {'params': {'max_capacity': 7}},
'array_varchar_1': {'params': {'max_capacity': 7}},
'array_bool_1': {'params': {'max_capacity': 7}},
'array_int8_2': {'params': {'max_capacity': 7}},
'array_int16_2': {'params': {'max_capacity': 7}},
'array_int32_2': {'params': {'max_capacity': 7}},
'array_int64_2': {'params': {'max_capacity': 7}},
'array_double_2': {'params': {'max_capacity': 7}},
'array_float_2': {'params': {'max_capacity': 7}},
'array_varchar_2': {'params': {'max_capacity': 7}},
'array_bool_2': {'params': {'max_capacity': 7}},
'array_int8_3': {'params': {'max_capacity': 7}},
'array_int16_3': {'params': {'max_capacity': 7}},
'array_int32_3': {'params': {'max_capacity': 7}},
'array_int64_3': {'params': {'max_capacity': 7}},
'array_double_3': {'params': {'max_capacity': 7}},
'array_float_3': {'params': {'max_capacity': 7}},
'array_varchar_3': {'params': {'max_capacity': 7}},
'array_bool_3': {'params': {'max_capacity': 7}}},
'dataset_name': 'local',
'dataset_size': 1500000,
'ni_per': 100},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int8_1',
'int16_1',
'int32_1',
'int64_1',
'double_1',
'float_1',
'varchar_1',
'bool_1',
'json_1',
'array_int8_1',
'array_int16_1',
'array_int32_1',
'array_int64_1',
'array_double_1',
'array_float_1',
'array_varchar_1',
'array_bool_1',
'int8_2',
'int16_2',
'int32_2',
'int64_2',
'double_2',
'float_2',
'varchar_2',
'bool_2',
'json_2',
'array_int8_2',
'array_int16_2',
'array_int32_2',
'array_int64_2',
'array_double_2',
'array_float_2',
'array_varchar_2',
'array_bool_2',
'int8_3',
'int16_3',
'int32_3',
'int64_3',
'double_3',
'float_3',
'varchar_3',
'bool_3',
'json_3',
'array_int8_3',
'array_int16_3',
'array_int32_3',
'array_int64_3',
'array_double_3',
'array_float_3',
'array_varchar_3',
'array_bool_3',
'varchar_tail_1',
'varchar_tail_2',
'varchar_tail_3',
'varchar_tail_4',
'varchar_tail_5',
'varchar_tail_6',
'varchar_tail_7',
'varchar_tail_8'],
'shards_num': 1},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'DISKANN',
'index_param': {}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '1h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 100,
'reqs': [{'search_param': {'search_list': 30},
'anns_field': 'float_vector',
'expr': 'id '
'> '
'150000',
'top_k': 10},
{'search_param': {'search_list': 100},
'anns_field': 'float_vector_1',
'expr': 'int64_1 '
'<= '
'1350000',
'top_k': 50},
{'search_param': {'search_list': 1500},
'anns_field': 'float_vector_2',
'expr': 'array_length(array_int8_2) '
'== '
'7',
'top_k': 1000},
{'search_param': {'search_list': 20000},
'anns_field': 'float_vector_3',
'expr': 'bool_3 '
'== '
'True',
'top_k': 16384}],
'rerank': {'RRFRanker': []},
'output_fields': ['float_vector'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}}]},
'run_id': 2024032505807040,
'datetime': '2024-03-25 04:23:00.273429',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 14723.9804,
'float_vector_1': {'RT': 17423.1928},
'float_vector_2': {'RT': 20895.6376},
'float_vector_3': {'RT': 1779.0977},
'int64_1': {'RT': 1.0256},
'id': {'RT': 1.0153},
'bool_3': {'RT': 1.0149}},
'insert': {'total_time': 4502.0788,
'VPS': 333.1794,
'batch_time': 0.3001,
'batch': 100},
'flush': {'RT': 3.5202},
'load': {'RT': 101.2388},
'Locust': {'Aggregated': {'Requests': 1710,
'Fails': 4,
'RPS': 0.48,
'fail_s': 0.0,
'RT_max': 60005.87,
'RT_avg': 41864.82,
'TP50': 41000.0,
'TP99': 56000.0},
'hybrid_search': {'Requests': 1710,
'Fails': 4,
'RPS': 0.48,
'fail_s': 0.0,
'RT_max': 60005.87,
'RT_avg': 41864.82,
'TP50': 41000.0,
'TP99': 56000.0}}}}}
@longjiquan
I noticed that the goroutines and OS threads are very high compared to normal instances:
Below are the goroutines and OS threads of normal instances:
Also, the querynode under cluster mode also encountered this issue, so maybe index building is not the root cause.
goroutine number might be fine. any idea about where the os thread created?
Root Coord disconnected from etcd
argo task: inverted-corn-1712332800 test case name: test_inverted_locust_partition_key_dml_standalone
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-132800-2-24-5637-etcd-0 1/1 Running 0 3m39s 10.104.33.66 4am-node36 <none> <none>
inverted-corn-132800-2-24-5637-milvus-standalone-5ff4877b7vbcf5 1/1 Running 0 3m39s 10.104.28.109 4am-node33 <none> <none>
inverted-corn-132800-2-24-5637-minio-7d877f7cb4-994sk 1/1 Running 0 3m39s 10.104.33.65 4am-node36 <none> <none> (base.py:257)
[2024-04-05 19:34:45,691 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|inverted-corn-132800-2-24-5637-milvus|inverted-corn-132800-2-24-5637-minio|inverted-corn-132800-2-24-5637-etcd|inverted-corn-132800-2-24-5637-pulsar|inverted-corn-132800-2-24-5637-zookeeper|inverted-corn-132800-2-24-5637-kafka|inverted-corn-132800-2-24-5637-log|inverted-corn-132800-2-24-5637-tikv' (util_cmd.py:14)
[2024-04-05 19:34:55,683 - INFO - fouram]: [CliClient] pod details of release(inverted-corn-132800-2-24-5637):
I0405 19:34:46.951623 566 request.go:665] Waited for 1.161921833s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/eventtracker.litmuschaos.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-132800-2-24-5637-etcd-0 1/1 Running 0 3h31m 10.104.33.66 4am-node36 <none> <none>
inverted-corn-132800-2-24-5637-milvus-standalone-5ff4877b7vbcf5 0/1 CrashLoopBackOff 8 (2m47s ago) 3h31m 10.104.28.109 4am-node33 <none> <none>
inverted-corn-132800-2-24-5637-minio-7d877f7cb4-994sk 1/1 Running 0 3h31m 10.104.33.65 4am-node36 <none> <none>
client log:
client failed to connect to milvus
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition_key: scalar enable partition_key(num_partitions=128)`
verify concurrent DML scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1': is_partition_key
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data <- connect failed
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- release
test result:
[2024-04-05 19:30:55,016 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-05 19:30:55,016 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-04-05 19:30:55,016 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: grpc delete 26860 61(0.23%) | 90 0 30851 6 | 2.49 0.01 (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: grpc flush 26790 67(0.25%) | 7165 266 279927 6300 | 2.48 0.01 (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: grpc insert 26797 47(0.18%) | 659 20 181076 130 | 2.49 0.00 (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: grpc release 26961 50(0.19%) | 76 0 30714 5 | 2.50 0.00 (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: Aggregated 107408 225(0.21%) | 1993 0 279927 58 | 9.96 0.02 (stats.py:789)
[2024-04-05 19:30:55,017 - INFO - fouram]: (stats.py:790)
[2024-04-05 19:30:55,020 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_8c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240405-7d721ae7-amd64'}}},
'host': 'inverted-corn-132800-2-24-5637-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2,
'num_partitions': 128},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 180,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 0}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'release',
'weight': 1,
'params': {'timeout': 30}}]},
'run_id': 2024040529862467,
'datetime': '2024-04-05 16:03:06.898874',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 746.6375,
'id': {'RT': 1.0163},
'int64_1': {'RT': 1.011}},
'insert': {'total_time': 609.2848,
'VPS': 8206.3429,
'batch_time': 6.0928,
'batch': 50000},
'flush': {'RT': 7.417},
'load': {'RT': 8.1251},
'Locust': {'Aggregated': {'Requests': 107408,
'Fails': 225,
'RPS': 9.96,
'fail_s': 0.0,
'RT_max': 279927.68,
'RT_avg': 1993.75,
'TP50': 58,
'TP99': 12000.0},
'delete': {'Requests': 26860,
'Fails': 61,
'RPS': 2.49,
'fail_s': 0.0,
'RT_max': 30851.74,
'RT_avg': 90.57,
'TP50': 6,
'TP99': 130.0},
'flush': {'Requests': 26790,
'Fails': 67,
'RPS': 2.48,
'fail_s': 0.0,
'RT_max': 279927.68,
'RT_avg': 7165.75,
'TP50': 6300.0,
'TP99': 14000.0},
'insert': {'Requests': 26797,
'Fails': 47,
'RPS': 2.49,
'fail_s': 0.0,
'RT_max': 181076.98,
'RT_avg': 659.48,
'TP50': 130.0,
'TP99': 3300.0},
'release': {'Requests': 26961,
'Fails': 50,
'RPS': 2.5,
'fail_s': 0.0,
'RT_max': 30714.13,
'RT_avg': 76.77,
'TP50': 5,
'TP99': 130.0}}}}}
Query Node disconnected from etcd
argo task: multi-vector-comp-2-2lbmt
server: init stats
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0 1/1 Running 0 6m15s 10.104.27.159 4am-node31 <none> <none>
multi-vector-comp-2-2lbmt-etcd-1 1/1 Running 0 6m15s 10.104.17.19 4am-node23 <none> <none>
multi-vector-comp-2-2lbmt-etcd-2 1/1 Running 0 6m14s 10.104.33.9 4am-node36 <none> <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc 1/1 Running 0 6m15s 10.104.31.174 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn 1/1 Running 0 6m15s 10.104.31.177 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm 1/1 Running 0 6m15s 10.104.31.178 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc 1/1 Running 0 6m15s 10.104.19.3 4am-node28 <none> <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4 1/1 Running 1 (2m9s ago) 6m15s 10.104.31.175 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57 1/1 Running 0 6m15s 10.104.31.176 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn 1/1 Running 0 6m15s 10.104.26.77 4am-node32 <none> <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j 1/1 Running 0 6m15s 10.104.31.173 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-minio-0 1/1 Running 0 6m15s 10.104.18.143 4am-node25 <none> <none>
multi-vector-comp-2-2lbmt-minio-1 1/1 Running 0 6m15s 10.104.31.182 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-minio-2 1/1 Running 0 6m15s 10.104.27.162 4am-node31 <none> <none>
multi-vector-comp-2-2lbmt-minio-3 1/1 Running 0 6m14s 10.104.32.210 4am-node39 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0 1/1 Running 0 6m15s 10.104.34.219 4am-node37 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1 1/1 Running 0 6m15s 10.104.15.173 4am-node20 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2 1/1 Running 0 6m14s 10.104.23.197 4am-node27 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr 0/1 Completed 0 6m15s 10.104.4.124 4am-node11 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0 1/1 Running 0 6m15s 10.104.5.121 4am-node12 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0 1/1 Running 0 6m15s 10.104.6.98 4am-node13 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9 0/1 Completed 0 6m15s 10.104.6.96 4am-node13 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0 1/1 Running 0 6m15s 10.104.9.210 4am-node14 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0 1/1 Running 0 6m15s 10.104.31.181 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1 1/1 Running 0 5m27s 10.104.17.21 4am-node23 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2 1/1 Running 0 4m52s 10.104.27.165 4am-node31 <none> <none>
after testing
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0 1/1 Running 0 39m 10.104.27.159 4am-node31 <none> <none>
multi-vector-comp-2-2lbmt-etcd-1 1/1 Running 0 39m 10.104.17.19 4am-node23 <none> <none>
multi-vector-comp-2-2lbmt-etcd-2 1/1 Running 0 39m 10.104.33.9 4am-node36 <none> <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc 1/1 Running 0 39m 10.104.31.174 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn 1/1 Running 0 39m 10.104.31.177 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm 1/1 Running 0 39m 10.104.31.178 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc 1/1 Running 0 39m 10.104.19.3 4am-node28 <none> <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4 1/1 Running 1 (35m ago) 39m 10.104.31.175 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57 1/1 Running 0 39m 10.104.31.176 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn 1/1 Running 1 (30m ago) 39m 10.104.26.77 4am-node32 <none> <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j 1/1 Running 0 39m 10.104.31.173 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-minio-0 1/1 Running 0 39m 10.104.18.143 4am-node25 <none> <none>
multi-vector-comp-2-2lbmt-minio-1 1/1 Running 0 39m 10.104.31.182 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-minio-2 1/1 Running 0 39m 10.104.27.162 4am-node31 <none> <none>
multi-vector-comp-2-2lbmt-minio-3 1/1 Running 0 39m 10.104.32.210 4am-node39 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0 1/1 Running 0 39m 10.104.34.219 4am-node37 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1 1/1 Running 0 39m 10.104.15.173 4am-node20 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2 1/1 Running 0 39m 10.104.23.197 4am-node27 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr 0/1 Completed 0 39m 10.104.4.124 4am-node11 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0 1/1 Running 0 39m 10.104.5.121 4am-node12 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0 1/1 Running 0 39m 10.104.6.98 4am-node13 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9 0/1 Completed 0 39m 10.104.6.96 4am-node13 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0 1/1 Running 0 39m 10.104.9.210 4am-node14 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0 1/1 Running 0 39m 10.104.31.181 4am-node34 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1 1/1 Running 0 39m 10.104.17.21 4am-node23 <none> <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2 1/1 Running 0 38m 10.104.27.165 4am-node31 <none> <none>
client pod name: multi-vector-comp-2-2lbmt-1555640872 client log: client.log client search error: 2024-04-09 11:30:40,756 ~ 2024-04-09 11:33:57,730
test result:
[2024-04-09 11:59:02,350 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-09 11:59:02,351 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-04-09 11:59:02,351 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 - INFO - fouram]: grpc search 765 70(9.15%) | 208749 21482 327504 189000 | 0.42 0.04 (stats.py:789)
[2024-04-09 11:59:02,351 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 - INFO - fouram]: Aggregated 765 70(9.15%) | 208749 21482 327504 189000 | 0.42 0.04 (stats.py:789)
[2024-04-09 11:59:02,351 - INFO - fouram]: (stats.py:790)
[2024-04-09 11:59:02,353 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c2m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
'memory': '32Gi'},
'requests': {'cpu': '9.0',
'memory': '17Gi'}},
'replicas': 1},
'indexNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '8Gi'},
'requests': {'cpu': '5.0',
'memory': '5Gi'}},
'replicas': 1},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '2Gi'},
'requests': {'cpu': '2.0',
'memory': '2Gi'}}},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'v2.3.12'}}},
'host': 'multi-vector-comp-2-2lbmt-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_hnsw_search_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'dataset_name': 'sift',
'dataset_size': 1000000,
'ni_per': 50000},
'collection_params': {'other_fields': [],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 100,
'during_time': 1800,
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 1,
'params': {'nq': 10000,
'top_k': 10,
'search_param': {'ef': 16},
'expr': None,
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 3600,
'random_data': True}}]},
'run_id': 2024040916443942,
'datetime': '2024-04-09 11:20:44.380762',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 20.6699},
'insert': {'total_time': 35.1925,
'VPS': 28415.1453,
'batch_time': 1.7596,
'batch': 50000},
'flush': {'RT': 2.5403},
'load': {'RT': 5.1836},
'Locust': {'Aggregated': {'Requests': 765,
'Fails': 70,
'RPS': 0.42,
'fail_s': 0.09,
'RT_max': 327504.59,
'RT_avg': 208749.74,
'TP50': 189000.0,
'TP99': 322000.0},
'search': {'Requests': 765,
'Fails': 70,
'RPS': 0.42,
'fail_s': 0.09,
'RT_max': 327504.59,
'RT_avg': 208749.74,
'TP50': 189000.0,
'TP99': 322000.0}}}}}
Build index failed
argo task: multi-vector-based-scene1-f8pw5 test case name: test_hybrid_search_serial_ivf_flat_hnsw_standalone
server: init stats
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-based-scene1-f8pw5-etcd-0 1/1 Running 0 2m33s 10.104.15.16 4am-node20 <none> <none>
multi-vector-based-scene1-f8pw5-milvus-standalone-65cf6f86z2s9q 1/1 Running 0 2m33s 10.104.26.10 4am-node32 <none> <none>
multi-vector-based-scene1-f8pw5-minio-6f9756b97c-mfzmr 1/1 Running 0 2m33s 10.104.15.19 4am-node20 <none> <none>
after testing
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-based-scene1-f8pw5-etcd-0 1/1 Running 0 24h 10.104.15.16 4am-node20 <none> <none>
multi-vector-based-scene1-f8pw5-milvus-standalone-65cf6f86z2s9q 1/1 Running 3 (17h ago) 24h 10.104.26.10 4am-node32 <none> <none>
multi-vector-based-scene1-f8pw5-minio-6f9756b97c-mfzmr 1/1 Running 0 24h 10.104.15.19 4am-node20 <none> <none>
client pod name: multi-vector-based-scene1-f8pw5-19328334
client log:
test steps:
1. create a collection, 8 fields: "id", "float_vector", "float_vector_1", "int64_1", "int64_2", "float_1", "double_1", "varchar_1"
2. build index
IVF_FLAT: float_vector
HNSW: float_vector_1
INVERTED: "int64_1", "int64_2", "float_1", "double_1", "varchar_1"
3. insert 25m data
4. flush collection
5. build index again with the same params <- failed
server config:
@longjiquan please help to check, thanks
@wangting0128 it seems to be all different issues. maybe we can assign different people?
func (s *storageV1Serializer) setTaskMeta(task *SyncTask, pack *SyncPack) { task.WithCollectionID(pack.collectionID). WithPartitionID(pack.partitionID). WithChannelName(pack.channelName). WithSegmentID(pack.segmentID). WithBatchSize(pack.batchSize). WithSchema(s.metacache.Schema()). WithStartPosition(pack.startPosition). WithCheckpoint(pack.checkpoint). WithLevel(pack.level). WithTimeRange(pack.tsFrom, pack.tsTo). WithMetaCache(s.metacache). WithMetaWriter(s.metaWriter). WithFailureCallback(func(err error) { // TODO could change to unsub channel in the future panic(err) }) } @congqixia we need to refine the flush logic. it should be retried forever but not panic easily
also the core of this issue is cpu is too high under that situation. @longjiquan is there any analysis result? Is there any where we failed to limit the cpu cores?
@wangting0128 it seems to be all different issues. maybe we can assign different people?
Got it! Reopened a new issue: #32400
Maybe this issue is not caused by the inverted index. I noticed that there is no any inverted index building job before the Milvus disconnected from etcd. See the logs.
One possibility is search becomes too slow on such segments and block the GO P thread. having 64K length varchar is bad for milvus because the segment also become huge
@longjiquan please check the execution time for each search
One possibility is search becomes too slow on such segments and block the GO P thread. having 64K length varchar is bad for milvus because the segment also become huge
Synchronously, in the test scenario here, the values of the varchar field are all integers converted int to strings, and there is no length of 64k.
Query Node disconnected from etcd
argo task: multi-vector-comp-2-2lbmt
server: init stats
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES multi-vector-comp-2-2lbmt-etcd-0 1/1 Running 0 6m15s 10.104.27.159 4am-node31 <none> <none> multi-vector-comp-2-2lbmt-etcd-1 1/1 Running 0 6m15s 10.104.17.19 4am-node23 <none> <none> multi-vector-comp-2-2lbmt-etcd-2 1/1 Running 0 6m14s 10.104.33.9 4am-node36 <none> <none> multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc 1/1 Running 0 6m15s 10.104.31.174 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn 1/1 Running 0 6m15s 10.104.31.177 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm 1/1 Running 0 6m15s 10.104.31.178 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc 1/1 Running 0 6m15s 10.104.19.3 4am-node28 <none> <none> multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4 1/1 Running 1 (2m9s ago) 6m15s 10.104.31.175 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57 1/1 Running 0 6m15s 10.104.31.176 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn 1/1 Running 0 6m15s 10.104.26.77 4am-node32 <none> <none> multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j 1/1 Running 0 6m15s 10.104.31.173 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-minio-0 1/1 Running 0 6m15s 10.104.18.143 4am-node25 <none> <none> multi-vector-comp-2-2lbmt-minio-1 1/1 Running 0 6m15s 10.104.31.182 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-minio-2 1/1 Running 0 6m15s 10.104.27.162 4am-node31 <none> <none> multi-vector-comp-2-2lbmt-minio-3 1/1 Running 0 6m14s 10.104.32.210 4am-node39 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-0 1/1 Running 0 6m15s 10.104.34.219 4am-node37 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-1 1/1 Running 0 6m15s 10.104.15.173 4am-node20 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-2 1/1 Running 0 6m14s 10.104.23.197 4am-node27 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr 0/1 Completed 0 6m15s 10.104.4.124 4am-node11 <none> <none> multi-vector-comp-2-2lbmt-pulsar-broker-0 1/1 Running 0 6m15s 10.104.5.121 4am-node12 <none> <none> multi-vector-comp-2-2lbmt-pulsar-proxy-0 1/1 Running 0 6m15s 10.104.6.98 4am-node13 <none> <none> multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9 0/1 Completed 0 6m15s 10.104.6.96 4am-node13 <none> <none> multi-vector-comp-2-2lbmt-pulsar-recovery-0 1/1 Running 0 6m15s 10.104.9.210 4am-node14 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-0 1/1 Running 0 6m15s 10.104.31.181 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-1 1/1 Running 0 5m27s 10.104.17.21 4am-node23 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-2 1/1 Running 0 4m52s 10.104.27.165 4am-node31 <none> <none>
after testing
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES multi-vector-comp-2-2lbmt-etcd-0 1/1 Running 0 39m 10.104.27.159 4am-node31 <none> <none> multi-vector-comp-2-2lbmt-etcd-1 1/1 Running 0 39m 10.104.17.19 4am-node23 <none> <none> multi-vector-comp-2-2lbmt-etcd-2 1/1 Running 0 39m 10.104.33.9 4am-node36 <none> <none> multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc 1/1 Running 0 39m 10.104.31.174 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn 1/1 Running 0 39m 10.104.31.177 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm 1/1 Running 0 39m 10.104.31.178 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc 1/1 Running 0 39m 10.104.19.3 4am-node28 <none> <none> multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4 1/1 Running 1 (35m ago) 39m 10.104.31.175 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57 1/1 Running 0 39m 10.104.31.176 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn 1/1 Running 1 (30m ago) 39m 10.104.26.77 4am-node32 <none> <none> multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j 1/1 Running 0 39m 10.104.31.173 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-minio-0 1/1 Running 0 39m 10.104.18.143 4am-node25 <none> <none> multi-vector-comp-2-2lbmt-minio-1 1/1 Running 0 39m 10.104.31.182 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-minio-2 1/1 Running 0 39m 10.104.27.162 4am-node31 <none> <none> multi-vector-comp-2-2lbmt-minio-3 1/1 Running 0 39m 10.104.32.210 4am-node39 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-0 1/1 Running 0 39m 10.104.34.219 4am-node37 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-1 1/1 Running 0 39m 10.104.15.173 4am-node20 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-2 1/1 Running 0 39m 10.104.23.197 4am-node27 <none> <none> multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr 0/1 Completed 0 39m 10.104.4.124 4am-node11 <none> <none> multi-vector-comp-2-2lbmt-pulsar-broker-0 1/1 Running 0 39m 10.104.5.121 4am-node12 <none> <none> multi-vector-comp-2-2lbmt-pulsar-proxy-0 1/1 Running 0 39m 10.104.6.98 4am-node13 <none> <none> multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9 0/1 Completed 0 39m 10.104.6.96 4am-node13 <none> <none> multi-vector-comp-2-2lbmt-pulsar-recovery-0 1/1 Running 0 39m 10.104.9.210 4am-node14 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-0 1/1 Running 0 39m 10.104.31.181 4am-node34 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-1 1/1 Running 0 39m 10.104.17.21 4am-node23 <none> <none> multi-vector-comp-2-2lbmt-pulsar-zookeeper-2 1/1 Running 0 38m 10.104.27.165 4am-node31 <none> <none>
client pod name: multi-vector-comp-2-2lbmt-1555640872 client log: client.log client search error: 2024-04-09 11:30:40,756 ~ 2024-04-09 11:33:57,730
test result:
[2024-04-09 11:59:02,350 - INFO - fouram]: Print locust final stats. (locust_runner.py:56) [2024-04-09 11:59:02,351 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789) [2024-04-09 11:59:02,351 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789) [2024-04-09 11:59:02,351 - INFO - fouram]: grpc search 765 70(9.15%) | 208749 21482 327504 189000 | 0.42 0.04 (stats.py:789) [2024-04-09 11:59:02,351 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789) [2024-04-09 11:59:02,351 - INFO - fouram]: Aggregated 765 70(9.15%) | 208749 21482 327504 189000 | 0.42 0.04 (stats.py:789) [2024-04-09 11:59:02,351 - INFO - fouram]: (stats.py:790) [2024-04-09 11:59:02,353 - INFO - fouram]: [PerfTemplate] Report data: {'server': {'deploy_tool': 'helm', 'deploy_mode': 'cluster', 'config_name': 'cluster_2c2m', 'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0', 'memory': '32Gi'}, 'requests': {'cpu': '9.0', 'memory': '17Gi'}}, 'replicas': 1}, 'indexNode': {'resources': {'limits': {'cpu': '8.0', 'memory': '8Gi'}, 'requests': {'cpu': '5.0', 'memory': '5Gi'}}, 'replicas': 1}, 'dataNode': {'resources': {'limits': {'cpu': '2.0', 'memory': '2Gi'}, 'requests': {'cpu': '2.0', 'memory': '2Gi'}}}, 'cluster': {'enabled': True}, 'pulsar': {}, 'kafka': {}, 'minio': {'metrics': {'podMonitor': {'enabled': True}}}, 'etcd': {'metrics': {'enabled': True, 'podMonitor': {'enabled': True}}}, 'metrics': {'serviceMonitor': {'enabled': True}}, 'log': {'level': 'debug'}, 'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus', 'tag': 'v2.3.12'}}}, 'host': 'multi-vector-comp-2-2lbmt-milvus.qa-milvus.svc.cluster.local', 'port': '19530', 'uri': ''}, 'client': {'test_case_type': 'ConcurrentClientBase', 'test_case_name': 'test_concurrent_locust_hnsw_search_cluster', 'test_case_params': {'dataset_params': {'metric_type': 'L2', 'dim': 128, 'dataset_name': 'sift', 'dataset_size': 1000000, 'ni_per': 50000}, 'collection_params': {'other_fields': [], 'shards_num': 2}, 'resource_groups_params': {'reset': False}, 'database_user_params': {'reset_rbac': False, 'reset_db': False}, 'index_params': {'index_type': 'HNSW', 'index_param': {'M': 8, 'efConstruction': 200}}, 'concurrent_params': {'concurrent_number': 100, 'during_time': 1800, 'interval': 20, 'spawn_rate': None}, 'concurrent_tasks': [{'type': 'search', 'weight': 1, 'params': {'nq': 10000, 'top_k': 10, 'search_param': {'ef': 16}, 'expr': None, 'guarantee_timestamp': None, 'partition_names': None, 'output_fields': None, 'ignore_growing': False, 'group_by_field': None, 'timeout': 3600, 'random_data': True}}]}, 'run_id': 2024040916443942, 'datetime': '2024-04-09 11:20:44.380762', 'client_version': '2.4.0'}, 'result': {'test_result': {'index': {'RT': 20.6699}, 'insert': {'total_time': 35.1925, 'VPS': 28415.1453, 'batch_time': 1.7596, 'batch': 50000}, 'flush': {'RT': 2.5403}, 'load': {'RT': 5.1836}, 'Locust': {'Aggregated': {'Requests': 765, 'Fails': 70, 'RPS': 0.42, 'fail_s': 0.09, 'RT_max': 327504.59, 'RT_avg': 208749.74, 'TP50': 189000.0, 'TP99': 322000.0}, 'search': {'Requests': 765, 'Fails': 70, 'RPS': 0.42, 'fail_s': 0.09, 'RT_max': 327504.59, 'RT_avg': 208749.74, 'TP50': 189000.0, 'TP99': 322000.0}}}}}
For example, in this scenario, there are only int64 primary key field and one vector field.