milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster] flush 180s timeout in DQL scene with 1024 `reqs` for hybrid_search
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240204-69596306
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc19
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: multi-vector-scene-mix-84bqz test case name: test_hybrid_search_locust_dql_max_reqs_cluster
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-scene-mix-84bqz-7-etcd-0 1/1 Running 0 6h28m 10.104.20.228 4am-node22 <none> <none>
multi-vector-scene-mix-84bqz-7-etcd-1 1/1 Running 0 6h28m 10.104.25.65 4am-node30 <none> <none>
multi-vector-scene-mix-84bqz-7-etcd-2 1/1 Running 0 6h28m 10.104.19.208 4am-node28 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-datacoord-6b95c9fc6c846b2 1/1 Running 0 6h28m 10.104.19.197 4am-node28 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-datanode-56d6475f56-t64nb 1/1 Running 1 (6h23m ago) 6h28m 10.104.26.55 4am-node32 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-indexcoord-6fb84db66k7ztc 1/1 Running 0 6h28m 10.104.25.57 4am-node30 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-indexnode-7dd7dcd887xmk84 1/1 Running 0 6h28m 10.104.23.117 4am-node27 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-proxy-698fdd46b6-rswmn 1/1 Running 1 (6h23m ago) 6h28m 10.104.26.56 4am-node32 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-querycoord-547f7b6794qwcc 1/1 Running 1 (6h23m ago) 6h28m 10.104.18.62 4am-node25 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-querynode-95b46f967-9b9g7 1/1 Running 0 6h28m 10.104.1.13 4am-node10 <none> <none>
multi-vector-scene-mix-84bqz-7-milvus-rootcoord-6bbb958b7-cmz9r 1/1 Running 1 (6h23m ago) 6h28m 10.104.1.9 4am-node10 <none> <none>
multi-vector-scene-mix-84bqz-7-minio-0 1/1 Running 0 6h28m 10.104.20.230 4am-node22 <none> <none>
multi-vector-scene-mix-84bqz-7-minio-1 1/1 Running 0 6h28m 10.104.19.206 4am-node28 <none> <none>
multi-vector-scene-mix-84bqz-7-minio-2 1/1 Running 0 6h28m 10.104.17.165 4am-node23 <none> <none>
multi-vector-scene-mix-84bqz-7-minio-3 1/1 Running 0 6h28m 10.104.25.66 4am-node30 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-0 1/1 Running 0 6h28m 10.104.18.75 4am-node25 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-1 1/1 Running 0 6h28m 10.104.20.232 4am-node22 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-2 1/1 Running 0 6h28m 10.104.17.166 4am-node23 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-init-xfg49 0/1 Completed 0 6h28m 10.104.20.221 4am-node22 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-broker-0 1/1 Running 0 6h28m 10.104.6.4 4am-node13 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-proxy-0 1/1 Running 0 6h28m 10.104.21.24 4am-node24 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-pulsar-init-4grwb 0/1 Completed 0 6h28m 10.104.6.3 4am-node13 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-recovery-0 1/1 Running 0 6h28m 10.104.21.23 4am-node24 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-0 1/1 Running 0 6h28m 10.104.20.231 4am-node22 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-1 1/1 Running 0 6h27m 10.104.27.235 4am-node31 <none> <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-2 1/1 Running 0 6h25m 10.104.34.151 4am-node37 <none> <none>
client pod name: multi-vector-scene-mix-84bqz-3046920368 client log:
Expected Behavior
No response
Steps To Reproduce
concurrent test and calculation of RT and QPS
:purpose: `DQL & max reqs=1024`
verify DQL & max reqs=1024 scenario,
which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'float_vector_1': 128dim,
'float_vector_2': 128dim,
'float_vector_3': 128dim,
scalar field: int64_1, varchar_1
2. build indexes:
IVF_FLAT: 'float_vector'
HNSW: 'float_vector_1',
DISKANN: 'float_vector_2'
IVF_SQ8: 'float_vector_3'
INVERTED: 'int64_1', 'varchar_1'
default scalar index: 'id'
3. insert 1m data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- flush
- load
- search
- hybrid_search: len(reqs) = 1024
- query
Milvus Log
No response
Anything else?
server config:
{
"queryNode": {
"resources": {
"limits": {
"cpu": "16.0",
"memory": "64Gi"
},
"requests": {
"cpu": "9.0",
"memory": "33Gi"
}
},
"replicas": 1
},
"indexNode": {
"resources": {
"limits": {
"cpu": "8.0",
"memory": "8Gi"
},
"requests": {
"cpu": "5.0",
"memory": "5Gi"
}
},
"replicas": 1
},
"dataNode": {
"resources": {
"limits": {
"cpu": "2.0",
"memory": "8Gi"
},
"requests": {
"cpu": "2.0",
"memory": "5Gi"
}
}
},
"cluster": {
"enabled": true
},
"pulsar": {},
"kafka": {},
"minio": {
"metrics": {
"podMonitor": {
"enabled": true
}
}
},
"etcd": {
"metrics": {
"enabled": true,
"podMonitor": {
"enabled": true
}
}
},
"metrics": {
"serviceMonitor": {
"enabled": true
}
},
"log": {
"level": "debug"
},
"image": {
"all": {
"repository": "harbor.milvus.io/milvus/milvus",
"tag": "master-20240204-69596306"
}
}
}
client config:
test result:
{
"test_result": {
"index": {
"RT": 967.0936,
"float_vector_1": {
"RT": 0.5557
},
"float_vector_2": {
"RT": 2.042
},
"float_vector_3": {
"RT": 1.0627
},
"id": {
"RT": 1.0318
},
"int64_1": {
"RT": 1.0643
},
"varchar_1": {
"RT": 1.0247
}
},
"insert": {
"total_time": 175.1558,
"VPS": 5709.2029,
"batch_time": 1.7516,
"batch": 10000
},
"flush": {
"RT": 30.2399
},
"load": {
"RT": 11.6515
},
"Locust": {
"Aggregated": {
"Requests": 5481,
"Fails": 409,
"RPS": 0.51,
"fail_s": 0.07,
"RT_max": 548240,
"RT_avg": 192900.39,
"TP50": 187000,
"TP99": 494000
},
"flush": {
"Requests": 1113,
"Fails": 409,
"RPS": 0.1,
"fail_s": 0.37,
"RT_max": 548240,
"RT_avg": 339588.99,
"TP50": 335000,
"TP99": 511000
},
"hybrid_search": {
"Requests": 1082,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 268701.09,
"RT_avg": 68170.01,
"TP50": 64000,
"TP99": 230000
},
"load": {
"Requests": 1102,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 548204.59,
"RT_avg": 337781.77,
"TP50": 333000,
"TP99": 512000
},
"query": {
"Requests": 1065,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 378817.38,
"RT_avg": 187034,
"TP50": 180000,
"TP99": 359000
},
"search": {
"Requests": 1119,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 230678.43,
"RT_avg": 30507.44,
"TP50": 29000,
"TP99": 229000
}
}
}
}
flush timeout 180s
argo task:inverted-corn-1708358400 test case name:test_inverted_locust_partition_key_dml_standalone milvus image: master-20240219-43e8cd53-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-158400-2-82-4682-etcd-0 1/1 Running 0 3h17m 10.104.16.107 4am-node21 <none> <none>
inverted-corn-158400-2-82-4682-milvus-standalone-65b65977dnhdxg 1/1 Running 0 3h17m 10.104.27.229 4am-node31 <none> <none>
inverted-corn-158400-2-82-4682-minio-5bd6797c67-cdv4d 1/1 Running 0 3h17m 10.104.23.223 4am-node27 <none> <none>
client pod name: inverted-corn-1708358400-3041780947
client logs:
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition_key: scalar enable partition_key(num_partitions=128)`
verify concurrent DML scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1': is_partition_key
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- release
test result:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_8c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240219-43e8cd53-amd64'}}},
'host': 'inverted-corn-158400-2-82-4682-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2,
'num_partitions': 128},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 180,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 0}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'release',
'weight': 1,
'params': {'timeout': 30}}]},
'run_id': 2024021985241562,
'datetime': '2024-02-19 16:02:04.509472',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 0.5101,
'id': {'RT': 0.5091},
'int64_1': {'RT': 0.5091}},
'insert': {'total_time': 145.5963,
'VPS': 34341.532,
'batch_time': 1.456,
'batch': 50000},
'flush': {'RT': 559.5905},
'load': {'RT': 2.5874},
'Locust': {'Aggregated': {'Requests': 4821,
'Fails': 1180,
'RPS': 0.45,
'fail_s': 0.24,
'RT_max': 182973.29,
'RT_avg': 44150.94,
'TP50': 27,
'TP99': 181000.0},
'delete': {'Requests': 1245,
'Fails': 0,
'RPS': 0.12,
'fail_s': 0.0,
'RT_max': 90.65,
'RT_avg': 6.5,
'TP50': 3,
'TP99': 56},
'flush': {'Requests': 1180,
'Fails': 1180,
'RPS': 0.11,
'fail_s': 1.0,
'RT_max': 182973.29,
'RT_avg': 180318.67,
'TP50': 180000.0,
'TP99': 181000.0},
'insert': {'Requests': 1202,
'Fails': 0,
'RPS': 0.11,
'fail_s': 0.0,
'RT_max': 280.76,
'RT_avg': 46.83,
'TP50': 41,
'TP99': 140.0},
'release': {'Requests': 1194,
'Fails': 0,
'RPS': 0.11,
'fail_s': 0.0,
'RT_max': 2203.0,
'RT_avg': 9.41,
'TP50': 2,
'TP99': 59}}}}}
flush timeout 180s
argo task: inverted-corn-1709049600 test case name: test_inverted_locust_partition_key_dml_standalone image: master-20240227-f87a3a13-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-corn-149600-2-93-8599-etcd-0 1/1 Running 0 3h22m 10.104.32.236 4am-node39 <none> <none>
inverted-corn-149600-2-93-8599-milvus-standalone-58d6cd97d2cfkz 1/1 Running 1 (3h20m ago) 3h22m 10.104.19.67 4am-node28 <none> <none>
inverted-corn-149600-2-93-8599-minio-54cf95b55-kqhlh 1/1 Running 0 3h22m 10.104.32.237 4am-node39 <none> <none>
client pod name: inverted-corn-1709049600-1162705415 client log: client.log
test steps:
concurrent test and calculation of RT and QPS
:purpose: `partition_key: scalar enable partition_key(num_partitions=128)`
verify concurrent DML scenario which
scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'int64_1': is_partition_key
2. build indexes:
IVF_FLAT: 'float_vector'
INVERTED: 'id', 'int64_1'
3. insert 5 million data
4. flush collection
5. build indexes again using the same params
6. load collection
7. concurrent request:
- insert
- delete
- flush
- release
test result:
[2024-02-27 19:28:37,101 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-27 19:28:37,101 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-27 19:28:37,102 - INFO - fouram]: grpc delete 5437 0(0.00%) | 18 1 383 5 | 0.50 0.00 (stats.py:789)
[2024-02-27 19:28:37,102 - INFO - fouram]: grpc flush 5405 77(1.42%) | 29881 340 204158 19000 | 0.50 0.01 (stats.py:789)
[2024-02-27 19:28:37,102 - INFO - fouram]: grpc insert 5410 31(0.57%) | 9689 28 180059 5400 | 0.50 0.00 (stats.py:789)
[2024-02-27 19:28:37,103 - INFO - fouram]: grpc release 5564 0(0.00%) | 17 1 1025 4 | 0.52 0.00 (stats.py:789)
[2024-02-27 19:28:37,103 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-27 19:28:37,103 - INFO - fouram]: Aggregated 21816 108(0.50%) | 9815 1 204158 85 | 2.02 0.01 (stats.py:789)
[2024-02-27 19:28:37,103 - INFO - fouram]: (stats.py:790)
[2024-02-27 19:28:37,110 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_8c16m',
'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240227-f87a3a13-amd64'}}},
'host': 'inverted-corn-149600-2-93-8599-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {'index_type': 'INVERTED'},
'int64_1': {'index_type': 'INVERTED'}},
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'sift',
'dataset_size': 5000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['int64_1'],
'shards_num': 2,
'num_partitions': 128},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '3h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 10,
'timeout': 180,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 0}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 9,
'timeout': 30}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 180}},
{'type': 'release',
'weight': 1,
'params': {'timeout': 30}}]},
'run_id': 2024022700172497,
'datetime': '2024-02-27 16:06:57.151992',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 797.5195,
'id': {'RT': 1.0174},
'int64_1': {'RT': 1.02}},
'insert': {'total_time': 174.1965,
'VPS': 28703.2173,
'batch_time': 1.742,
'batch': 50000},
'flush': {'RT': 16.4752},
'load': {'RT': 9.1697},
'Locust': {'Aggregated': {'Requests': 21816,
'Fails': 108,
'RPS': 2.02,
'fail_s': 0.0,
'RT_max': 204158.91,
'RT_avg': 9815.15,
'TP50': 85,
'TP99': 120000.0},
'delete': {'Requests': 5437,
'Fails': 0,
'RPS': 0.5,
'fail_s': 0.0,
'RT_max': 383.34,
'RT_avg': 18.87,
'TP50': 5,
'TP99': 110.0},
'flush': {'Requests': 5405,
'Fails': 77,
'RPS': 0.5,
'fail_s': 0.01,
'RT_max': 204158.91,
'RT_avg': 29881.0,
'TP50': 19000.0,
'TP99': 185000.0},
'insert': {'Requests': 5410,
'Fails': 31,
'RPS': 0.5,
'fail_s': 0.01,
'RT_max': 180059.4,
'RT_avg': 9689.75,
'TP50': 5400.0,
'TP99': 101000.0},
'release': {'Requests': 5564,
'Fails': 0,
'RPS': 0.52,
'fail_s': 0.0,
'RT_max': 1025.65,
'RT_avg': 17.33,
'TP50': 4,
'TP99': 100.0}}}}}
i'd set priority to high for a flush issue.
channel checkpoint lag is sometimes bigger than 180s, which is the root cause why flush 180s timeout.
/assign
Recurrent
argo task: multi-vector-scene-mix-6r7dt test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240401-d4d0c6be8-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-scene-mix-6r7dt-7-etcd-0 1/1 Running 0 6h29m 10.104.31.50 4am-node34 <none> <none>
multi-vector-scene-mix-6r7dt-7-etcd-1 1/1 Running 0 6h29m 10.104.20.101 4am-node22 <none> <none>
multi-vector-scene-mix-6r7dt-7-etcd-2 1/1 Running 0 6h29m 10.104.30.88 4am-node38 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-datacoord-6579d9d4774m5wr 1/1 Running 1 (6h24m ago) 6h29m 10.104.5.26 4am-node12 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-datanode-5d79b8654-rhdqv 1/1 Running 1 (6h24m ago) 6h29m 10.104.29.64 4am-node35 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-indexcoord-6fcfb8976qxtj6 1/1 Running 0 6h29m 10.104.14.27 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-indexnode-5b47fcb5c-k8txc 1/1 Running 0 6h29m 10.104.6.105 4am-node13 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-proxy-7cdccbf7d8-g6nqm 1/1 Running 1 (6h24m ago) 6h29m 10.104.14.25 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-querycoord-5c78f747bpxlmf 1/1 Running 1 (6h24m ago) 6h29m 10.104.14.29 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-querynode-644b57f6d94w56w 1/1 Running 0 6h29m 10.104.5.27 4am-node12 <none> <none>
multi-vector-scene-mix-6r7dt-7-milvus-rootcoord-85d8cf89492pncr 1/1 Running 1 (6h24m ago) 6h29m 10.104.5.25 4am-node12 <none> <none>
multi-vector-scene-mix-6r7dt-7-minio-0 1/1 Running 0 6h29m 10.104.30.84 4am-node38 <none> <none>
multi-vector-scene-mix-6r7dt-7-minio-1 1/1 Running 0 6h29m 10.104.20.105 4am-node22 <none> <none>
multi-vector-scene-mix-6r7dt-7-minio-2 1/1 Running 0 6h29m 10.104.31.67 4am-node34 <none> <none>
multi-vector-scene-mix-6r7dt-7-minio-3 1/1 Running 0 6h29m 10.104.29.77 4am-node35 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-0 1/1 Running 0 6h29m 10.104.30.83 4am-node38 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-1 1/1 Running 0 6h29m 10.104.20.107 4am-node22 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-2 1/1 Running 0 6h29m 10.104.31.66 4am-node34 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-init-x9xvx 0/1 Completed 0 6h29m 10.104.14.28 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-broker-0 1/1 Running 0 6h29m 10.104.14.31 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-proxy-0 1/1 Running 0 6h29m 10.104.9.188 4am-node14 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-pulsar-init-52zjq 0/1 Completed 0 6h29m 10.104.14.26 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-recovery-0 1/1 Running 0 6h29m 10.104.14.30 4am-node18 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-0 1/1 Running 0 6h29m 10.104.31.49 4am-node34 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-1 1/1 Running 0 6h26m 10.104.28.132 4am-node33 <none> <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-2 1/1 Running 0 6h24m 10.104.29.98 4am-node35 <none> <none>
client pod name: multi-vector-scene-mix-6r7dt-2282897328
client log:
client.log
[2024-04-01 20:51:17,858 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-01 20:51:17,858 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-04-01 20:51:17,858 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: grpc flush 1041 451(43.32%) | 357341 192409 608509 351000 | 0.10 0.04 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: grpc hybrid_search 1079 0(0.00%) | 69104 20244 272834 63000 | 0.10 0.00 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: grpc load 1050 0(0.00%) | 355687 398 608586 356000 | 0.10 0.00 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: grpc query 1079 0(0.00%) | 194759 69671 504588 190000 | 0.10 0.00 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: grpc search 1085 0(0.00%) | 33079 26 232173 30000 | 0.10 0.00 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: Aggregated 5334 451(8.46%) | 199862 26 608586 190000 | 0.49 0.04 (stats.py:789)
[2024-04-01 20:51:17,859 - INFO - fouram]: (stats.py:790)
test steps:
concurrent test and calculation of RT and QPS
:purpose: `DQL & max reqs=1024`
verify DQL & max reqs=1024 scenario,
which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'float_vector_1': 128dim,
'float_vector_2': 128dim,
'float_vector_3': 128dim,
scalar field: int64_1, varchar_1
2. build indexes:
IVF_FLAT: 'float_vector'
HNSW: 'float_vector_1',
DISKANN: 'float_vector_2'
IVF_SQ8: 'float_vector_3'
INVERTED: 'int64_1', 'varchar_1'
default scalar index: 'id'
3. insert 1 million data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- flush
- load
- search
- hybrid_search: len(reqs) = 1024
- query
test result:
"result": {
"test_result": {
"index": {
"RT": 981.6727,
"float_vector_1": {
"RT": 1.0218
},
"float_vector_2": {
"RT": 2.0281
},
"float_vector_3": {
"RT": 1.02
},
"id": {
"RT": 1.0226
},
"int64_1": {
"RT": 0.5214
},
"varchar_1": {
"RT": 1.028
}
},
"insert": {
"total_time": 148.5601,
"VPS": 6731.2825,
"batch_time": 1.4856,
"batch": 10000
},
"flush": {
"RT": 22.2226
},
"load": {
"RT": 8.6411
},
"Locust": {
"Aggregated": {
"Requests": 5313,
"Fails": 0,
"RPS": 0.49,
"fail_s": 0,
"RT_max": 139961.75,
"RT_avg": 40434.15,
"TP50": 41000,
"TP99": 109000
},
"flush": {
"Requests": 1109,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 139961.75,
"RT_avg": 38920.65,
"TP50": 40000,
"TP99": 111000
},
"hybrid_search": {
"Requests": 1062,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 129233.3,
"RT_avg": 64638.19,
"TP50": 61000,
"TP99": 113000
},
"load": {
"Requests": 1063,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 125997,
"RT_avg": 29857.82,
"TP50": 24000,
"TP99": 103000
},
"query": {
"Requests": 1079,
"Fails": 0,
"RPS": 0.1,
"fail_s": 0,
"RT_max": 130191.6,
"RT_avg": 37255.29,
"TP50": 40000,
"TP99": 93000
},
"search": {
"Requests": 1000,
"Fails": 0,
"RPS": 0.09,
"fail_s": 0,
"RT_max": 91257.24,
"RT_avg": 31080.53,
"TP50": 32000,
"TP99": 80000
}
}
}
}
server config:
client config:
related to https://github.com/milvus-io/milvus/issues/30552#issuecomment-2031769315
should be fixed /assign @wangting0128 /unassign @XuanYang-cn Please help verify
Recurrent
argo task: multi-vector-scene-mix-ld9h8 test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240415-e50599ba-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-scene-mix-ld9h8-7-etcd-0 1/1 Running 0 6h30m 10.104.31.251 4am-node34 <none> <none>
multi-vector-scene-mix-ld9h8-7-etcd-1 1/1 Running 0 6h30m 10.104.19.27 4am-node28 <none> <none>
multi-vector-scene-mix-ld9h8-7-etcd-2 1/1 Running 0 6h30m 10.104.25.110 4am-node30 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-datacoord-c9f6855cf-xlktl 1/1 Running 0 6h30m 10.104.31.244 4am-node34 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-datanode-7c898966f8-c6v79 1/1 Running 1 (6h25m ago) 6h30m 10.104.34.128 4am-node37 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-indexcoord-67b64b4d8gmsvs 1/1 Running 0 6h30m 10.104.6.177 4am-node13 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-indexnode-6d699b47c7xzr69 1/1 Running 0 6h30m 10.104.15.60 4am-node20 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-proxy-8474cf5474-jz6qq 1/1 Running 1 (6h25m ago) 6h30m 10.104.31.243 4am-node34 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-querycoord-d776d56f5lcgl9 1/1 Running 1 (6h25m ago) 6h30m 10.104.19.21 4am-node28 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-querynode-db5755485-mbsvr 1/1 Running 0 6h30m 10.104.6.178 4am-node13 <none> <none>
multi-vector-scene-mix-ld9h8-7-milvus-rootcoord-69c857b5cfplzxb 1/1 Running 1 (6h25m ago) 6h30m 10.104.6.174 4am-node13 <none> <none>
multi-vector-scene-mix-ld9h8-7-minio-0 1/1 Running 0 6h30m 10.104.31.250 4am-node34 <none> <none>
multi-vector-scene-mix-ld9h8-7-minio-1 1/1 Running 0 6h30m 10.104.25.105 4am-node30 <none> <none>
multi-vector-scene-mix-ld9h8-7-minio-2 1/1 Running 0 6h30m 10.104.17.64 4am-node23 <none> <none>
multi-vector-scene-mix-ld9h8-7-minio-3 1/1 Running 0 6h30m 10.104.19.26 4am-node28 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-0 1/1 Running 0 6h30m 10.104.29.9 4am-node35 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-1 1/1 Running 0 6h30m 10.104.17.65 4am-node23 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-2 1/1 Running 0 6h30m 10.104.25.111 4am-node30 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-init-zr2mm 0/1 Completed 0 6h30m 10.104.20.199 4am-node22 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-broker-0 1/1 Running 0 6h30m 10.104.30.194 4am-node38 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-proxy-0 1/1 Running 0 6h30m 10.104.9.98 4am-node14 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-pulsar-init-vlksb 0/1 Completed 0 6h30m 10.104.20.200 4am-node22 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-recovery-0 1/1 Running 0 6h30m 10.104.20.201 4am-node22 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-0 1/1 Running 0 6h30m 10.104.31.252 4am-node34 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-1 1/1 Running 0 6h29m 10.104.15.86 4am-node20 <none> <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-2 1/1 Running 0 6h27m 10.104.28.179 4am-node33 <none> <none>
client pod name: multi-vector-scene-mix-ld9h8-876450026
client log:
client.log
/unassign @wangting0128 /assign @XuanYang-cn
Recurrent
argo task: multi-vector-scene-mix-gkx8n test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240418-238f9a4a-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-scene-mix-gkx8n-7-etcd-0 1/1 Running 0 6h30m 10.104.24.140 4am-node29 <none> <none>
multi-vector-scene-mix-gkx8n-7-etcd-1 1/1 Running 0 6h30m 10.104.23.218 4am-node27 <none> <none>
multi-vector-scene-mix-gkx8n-7-etcd-2 1/1 Running 0 6h30m 10.104.30.196 4am-node38 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-datacoord-587c4f4d57fnw66 1/1 Running 1 (6h25m ago) 6h30m 10.104.14.83 4am-node18 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-datanode-6758fbdc4d-td9m7 1/1 Running 1 (6h25m ago) 6h30m 10.104.33.198 4am-node36 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-indexcoord-78c596cd8lbvns 1/1 Running 0 6h30m 10.104.18.14 4am-node25 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-indexnode-cd964ff95-9l6pv 1/1 Running 1 (6h25m ago) 6h30m 10.104.21.135 4am-node24 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-proxy-f9f4ddc76-ctg68 1/1 Running 1 (6h25m ago) 6h30m 10.104.17.245 4am-node23 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-querycoord-859c8bcbf4rcq6 1/1 Running 1 (6h25m ago) 6h30m 10.104.32.142 4am-node39 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-querynode-7fd56c9755jmzj2 1/1 Running 1 (6h25m ago) 6h30m 10.104.1.226 4am-node10 <none> <none>
multi-vector-scene-mix-gkx8n-7-milvus-rootcoord-86d678f65cvtv46 1/1 Running 1 (6h25m ago) 6h30m 10.104.32.143 4am-node39 <none> <none>
multi-vector-scene-mix-gkx8n-7-minio-0 1/1 Running 0 6h30m 10.104.18.37 4am-node25 <none> <none>
multi-vector-scene-mix-gkx8n-7-minio-1 1/1 Running 0 6h30m 10.104.23.213 4am-node27 <none> <none>
multi-vector-scene-mix-gkx8n-7-minio-2 1/1 Running 0 6h30m 10.104.24.141 4am-node29 <none> <none>
multi-vector-scene-mix-gkx8n-7-minio-3 1/1 Running 0 6h30m 10.104.34.103 4am-node37 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-0 1/1 Running 0 6h30m 10.104.19.206 4am-node28 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-1 1/1 Running 0 6h30m 10.104.23.219 4am-node27 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-2 1/1 Running 0 6h30m 10.104.34.104 4am-node37 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-init-9n4kh 0/1 Completed 0 6h30m 10.104.9.180 4am-node14 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-broker-0 1/1 Running 0 6h30m 10.104.18.15 4am-node25 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-proxy-0 1/1 Running 0 6h30m 10.104.14.84 4am-node18 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-pulsar-init-6z875 0/1 Completed 0 6h30m 10.104.18.13 4am-node25 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-recovery-0 1/1 Running 0 6h30m 10.104.13.31 4am-node16 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-0 1/1 Running 0 6h30m 10.104.23.217 4am-node27 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-1 1/1 Running 0 6h27m 10.104.30.200 4am-node38 <none> <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-2 1/1 Running 0 6h26m 10.104.31.49 4am-node34 <none> <none>
client pod name: multi-vector-scene-mix-gkx8n-2043521713
client log:
Recurrent
argo task: multi-vector-corn-1-1717077600 test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240530-68e2d532-amd64
server:
[2024-05-30 20:34:05,057 - INFO - fouram]: [Base] Deploy initial state:
I0530 14:12:12.999608 433 request.go:665] Waited for 1.198263357s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/argoproj.io/v1alpha1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-1-1717077600-7-etcd-0 1/1 Running 0 8m37s 10.104.26.20 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-etcd-1 1/1 Running 0 8m36s 10.104.23.159 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-etcd-2 1/1 Running 0 8m36s 10.104.21.105 4am-node24 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-datacoord-9d7d74c6v4wkb 1/1 Running 5 (6m16s ago) 8m37s 10.104.18.126 4am-node25 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-datanode-5f9f8bdfbsq7ms 1/1 Running 5 (6m11s ago) 8m37s 10.104.13.195 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-indexcoord-6fc48fdtrmwp 1/1 Running 0 8m38s 10.104.20.226 4am-node22 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-indexnode-78fdb9f6fkjdf 1/1 Running 5 (6m14s ago) 8m38s 10.104.20.227 4am-node22 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-proxy-7996b7fdbf-fz9g2 1/1 Running 5 (2m7s ago) 8m38s 10.104.6.24 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-querycoord-565bf5frzt2t 1/1 Running 5 (2m6s ago) 8m38s 10.104.13.196 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-querynode-77ccb866wxw6t 1/1 Running 4 (6m59s ago) 8m38s 10.104.5.221 4am-node12 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-rootcoord-5978757cnt7vf 1/1 Running 5 (6m17s ago) 8m37s 10.104.18.127 4am-node25 <none> <none>
multi-vector-corn-1-1717077600-7-minio-0 1/1 Running 0 8m36s 10.104.23.160 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-minio-1 1/1 Running 0 8m35s 10.104.26.25 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-minio-2 1/1 Running 0 8m35s 10.104.30.208 4am-node38 <none> <none>
multi-vector-corn-1-1717077600-7-minio-3 1/1 Running 0 8m34s 10.104.19.205 4am-node28 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-0 1/1 Running 0 8m37s 10.104.26.21 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-1 1/1 Running 0 8m36s 10.104.23.161 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-2 1/1 Running 0 8m35s 10.104.21.106 4am-node24 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-init-62fk4 0/1 Completed 0 8m38s 10.104.13.194 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-broker-0 1/1 Running 0 8m36s 10.104.9.34 4am-node14 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-proxy-0 1/1 Running 0 8m37s 10.104.6.25 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-pulsar-init-7rmcn 0/1 Completed 0 8m38s 10.104.13.193 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-recovery-0 1/1 Running 0 8m37s 10.104.6.26 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-0 1/1 Running 0 8m37s 10.104.26.18 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-1 1/1 Running 0 6m42s 10.104.30.216 4am-node38 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-2 1/1 Running 0 4m46s 10.104.19.227 4am-node28 <none> <none> (base.py:258)
[2024-05-30 20:34:05,057 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|multi-vector-corn-1-1717077600-7-milvus|multi-vector-corn-1-1717077600-7-minio|multi-vector-corn-1-1717077600-7-etcd|multi-vector-corn-1-1717077600-7-pulsar|multi-vector-corn-1-1717077600-7-zookeeper|multi-vector-corn-1-1717077600-7-kafka|multi-vector-corn-1-1717077600-7-log|multi-vector-corn-1-1717077600-7-tikv' (util_cmd.py:14)
[2024-05-30 20:34:15,719 - INFO - fouram]: [CliClient] pod details of release(multi-vector-corn-1-1717077600-7):
I0530 20:34:06.714505 544 request.go:665] Waited for 1.197453932s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/storage.k8s.io/v1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-1-1717077600-7-etcd-0 1/1 Running 0 6h30m 10.104.26.20 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-etcd-1 1/1 Running 0 6h30m 10.104.23.159 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-etcd-2 1/1 Running 0 6h30m 10.104.21.105 4am-node24 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-datacoord-9d7d74c6v4wkb 1/1 Running 5 (6h28m ago) 6h30m 10.104.18.126 4am-node25 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-datanode-5f9f8bdfbsq7ms 1/1 Running 5 (6h28m ago) 6h30m 10.104.13.195 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-indexcoord-6fc48fdtrmwp 1/1 Running 0 6h30m 10.104.20.226 4am-node22 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-indexnode-78fdb9f6fkjdf 1/1 Running 5 (6h28m ago) 6h30m 10.104.20.227 4am-node22 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-proxy-7996b7fdbf-fz9g2 1/1 Running 5 (6h24m ago) 6h30m 10.104.6.24 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-querycoord-565bf5frzt2t 1/1 Running 5 (6h24m ago) 6h30m 10.104.13.196 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-querynode-77ccb866wxw6t 1/1 Running 4 (6h28m ago) 6h30m 10.104.5.221 4am-node12 <none> <none>
multi-vector-corn-1-1717077600-7-milvus-rootcoord-5978757cnt7vf 1/1 Running 5 (6h28m ago) 6h30m 10.104.18.127 4am-node25 <none> <none>
multi-vector-corn-1-1717077600-7-minio-0 1/1 Running 0 6h30m 10.104.23.160 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-minio-1 1/1 Running 0 6h30m 10.104.26.25 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-minio-2 1/1 Running 0 6h30m 10.104.30.208 4am-node38 <none> <none>
multi-vector-corn-1-1717077600-7-minio-3 1/1 Running 0 6h30m 10.104.19.205 4am-node28 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-0 1/1 Running 0 6h30m 10.104.26.21 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-1 1/1 Running 0 6h30m 10.104.23.161 4am-node27 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-2 1/1 Running 0 6h30m 10.104.21.106 4am-node24 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-init-62fk4 0/1 Completed 0 6h30m 10.104.13.194 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-broker-0 1/1 Running 0 6h30m 10.104.9.34 4am-node14 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-proxy-0 1/1 Running 0 6h30m 10.104.6.25 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-pulsar-init-7rmcn 0/1 Completed 0 6h30m 10.104.13.193 4am-node16 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-recovery-0 1/1 Running 0 6h30m 10.104.6.26 4am-node13 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-0 1/1 Running 0 6h30m 10.104.26.18 4am-node32 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-1 1/1 Running 0 6h28m 10.104.30.216 4am-node38 <none> <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-2 1/1 Running 0 6h26m 10.104.19.227 4am-node28 <none> <none>
client pod name: multi-vector-corn-1-1717077600-1154628152
client log:
flush timeout during: 2024-05-30 17:37:07,104 ~ the end
This problem is related to concurrent flush and cp update.
Cocurrently flush will make cp lag more, at most 10mins. Because some times flushTs changes so quickly that cp updater cannot update cp immediately, it'll wait for 10mins-timely updater to update cp. Making flush timeout in 180s.
It's not an urgent issue for flush, because concurrent flush isn't common use cases.