milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster] Low CPU usage and high latency of hybrid_search in concurrent hybrid_search scenario
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240226-befe0e21-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc19
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: multi-vector-corn-mfmgj test case name: test_hybrid_search_locust_shard1_dql_indexes_replicas_cluster
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-corn-mfmgj-11-etcd-0 1/1 Running 0 12h 10.104.20.65 4am-node22 <none> <none>
multi-vector-corn-mfmgj-11-etcd-1 1/1 Running 0 12h 10.104.34.232 4am-node37 <none> <none>
multi-vector-corn-mfmgj-11-etcd-2 1/1 Running 0 12h 10.104.30.216 4am-node38 <none> <none>
multi-vector-corn-mfmgj-11-milvus-datacoord-d86db89c6-gmxsk 1/1 Running 0 12h 10.104.1.93 4am-node10 <none> <none>
multi-vector-corn-mfmgj-11-milvus-datanode-67dcc88689-6x8tv 1/1 Running 1 (12h ago) 12h 10.104.34.224 4am-node37 <none> <none>
multi-vector-corn-mfmgj-11-milvus-indexcoord-6ccd668df-s87gv 1/1 Running 0 12h 10.104.13.228 4am-node16 <none> <none>
multi-vector-corn-mfmgj-11-milvus-indexnode-67756c8444-8cclm 1/1 Running 0 12h 10.104.26.162 4am-node32 <none> <none>
multi-vector-corn-mfmgj-11-milvus-proxy-7b9c596cb7-267dt 1/1 Running 1 (12h ago) 12h 10.104.1.92 4am-node10 <none> <none>
multi-vector-corn-mfmgj-11-milvus-querycoord-7f84b66654-jrck4 1/1 Running 1 (12h ago) 12h 10.104.13.227 4am-node16 <none> <none>
multi-vector-corn-mfmgj-11-milvus-querynode-95c8b94f4-25trp 1/1 Running 0 12h 10.104.32.108 4am-node39 <none> <none>
multi-vector-corn-mfmgj-11-milvus-querynode-95c8b94f4-6wrw9 1/1 Running 0 12h 10.104.13.229 4am-node16 <none> <none>
multi-vector-corn-mfmgj-11-milvus-querynode-95c8b94f4-tm282 1/1 Running 0 12h 10.104.16.113 4am-node21 <none> <none>
multi-vector-corn-mfmgj-11-milvus-querynode-95c8b94f4-w4cmc 1/1 Running 0 12h 10.104.12.190 4am-node17 <none> <none>
multi-vector-corn-mfmgj-11-milvus-rootcoord-5b895f49c6-6n42r 1/1 Running 1 (12h ago) 12h 10.104.26.160 4am-node32 <none> <none>
multi-vector-corn-mfmgj-11-minio-0 1/1 Running 0 12h 10.104.15.112 4am-node20 <none> <none>
multi-vector-corn-mfmgj-11-minio-1 1/1 Running 0 12h 10.104.28.155 4am-node33 <none> <none>
multi-vector-corn-mfmgj-11-minio-2 1/1 Running 0 12h 10.104.29.81 4am-node35 <none> <none>
multi-vector-corn-mfmgj-11-minio-3 1/1 Running 0 12h 10.104.30.215 4am-node38 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-bookie-0 1/1 Running 0 12h 10.104.28.156 4am-node33 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-bookie-1 1/1 Running 0 12h 10.104.15.117 4am-node20 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-bookie-2 1/1 Running 0 12h 10.104.25.160 4am-node30 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-bookie-init-lgxgl 0/1 Completed 0 12h 10.104.28.146 4am-node33 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-broker-0 1/1 Running 0 12h 10.104.20.57 4am-node22 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-proxy-0 1/1 Running 0 12h 10.104.1.95 4am-node10 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-pulsar-init-mnqwl 0/1 Completed 0 12h 10.104.13.226 4am-node16 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-recovery-0 1/1 Running 0 12h 10.104.1.94 4am-node10 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-zookeeper-0 1/1 Running 0 12h 10.104.29.80 4am-node35 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-zookeeper-1 1/1 Running 0 12h 10.104.28.167 4am-node33 <none> <none>
multi-vector-corn-mfmgj-11-pulsar-zookeeper-2 1/1 Running 0 12h 10.104.26.169 4am-node32 <none> <none>
client pod name: multi-vector-corn-mfmgj-2255719034 client log: client.log
[2024-02-26 16:10:04,089 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 16:10:04,089 - INFO - fouram]: Aggregated 6144 0(0.00%) | 24199 1926 106004 20000 | 0.30 0.00 (stats.py:789)
[2024-02-26 16:10:04,089 - INFO - fouram]: (stats.py:790)
[2024-02-26 16:10:04,090 - INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: grpc hybrid_search 20000 25000 30000 33000 43000 55000 68000 78000 97000 106000 106000 6144 (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: Aggregated 20000 25000 30000 33000 43000 55000 68000 78000 97000 106000 106000 6144 (stats.py:819)
[2024-02-26 16:10:04,090 - INFO - fouram]: (stats.py:820)
[2024-02-26 16:10:14,637 - ERROR - fouram]: grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-02-26 16:08:14.635502', 'gRPC error': '2024-02-26 16:10:14.637707'}> (decorators.py:145)
[2024-02-26 16:10:14,639 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:14.637196947+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>, [requestId: 3f085ee2-d4c1-11ee-b173-22f6251fc128] (api_request.py:57)
[2024-02-26 16:10:14,639 - ERROR - fouram]: [CheckFunc] hybrid_search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:14.637196947+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
> (func_check.py:48)
[2024-02-26 16:10:24,091 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-26 16:10:24,092 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 16:10:24,092 - INFO - fouram]: grpc hybrid_search 6156 1(0.02%) | 24232 1926 120003 20000 | 0.90 0.00 (stats.py:789)
[2024-02-26 16:10:24,092 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 16:10:24,092 - INFO - fouram]: Aggregated 6156 1(0.02%) | 24232 1926 120003 20000 | 0.90 0.00 (stats.py:789)
[2024-02-26 16:10:24,092 - INFO - fouram]: (stats.py:790)
[2024-02-26 16:10:24,092 - INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-02-26 16:10:24,092 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2024-02-26 16:10:24,092 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 16:10:24,092 - INFO - fouram]: grpc hybrid_search 20000 25000 30000 33000 43000 55000 68000 79000 97000 120000 120000 6156 (stats.py:819)
[2024-02-26 16:10:24,092 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 16:10:24,092 - INFO - fouram]: Aggregated 20000 25000 30000 33000 43000 55000 68000 79000 97000 120000 120000 6156 (stats.py:819)
[2024-02-26 16:10:24,093 - INFO - fouram]: (stats.py:820)
[2024-02-26 16:10:41,635 - ERROR - fouram]: grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-02-26 16:08:41.632207', 'gRPC error': '2024-02-26 16:10:41.635128'}> (decorators.py:145)
[2024-02-26 16:10:41,636 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-02-26T16:10:41.634192847+00:00"}"
>, [requestId: 4f1fbe4c-d4c1-11ee-b173-22f6251fc128] (api_request.py:57)
[2024-02-26 16:10:41,636 - ERROR - fouram]: [CheckFunc] hybrid_search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-02-26T16:10:41.634192847+00:00"}"
> (func_check.py:48)
[2024-02-26 16:10:41,703 - ERROR - fouram]: grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-02-26 16:08:41.695219', 'gRPC error': '2024-02-26 16:10:41.703561'}> (decorators.py:145)
[2024-02-26 16:10:41,704 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.697174128+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>, [requestId: 4f295b96-d4c1-11ee-b173-22f6251fc128] (api_request.py:57)
[2024-02-26 16:10:41,704 - ERROR - fouram]: [CheckFunc] hybrid_search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.697174128+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
> (func_check.py:48)
[2024-02-26 16:10:41,977 - ERROR - fouram]: grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-02-26 16:08:41.761037', 'gRPC error': '2024-02-26 16:10:41.977427'}> (decorators.py:145)
[2024-02-26 16:10:41,978 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.764178367+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>, [requestId: 4f33638e-d4c1-11ee-b173-22f6251fc128] (api_request.py:57)
[2024-02-26 16:10:41,978 - ERROR - fouram]: [CheckFunc] hybrid_search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.764178367+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
> (func_check.py:48)
[2024-02-26 16:10:41,978 - ERROR - fouram]: grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-02-26 16:08:41.833096', 'gRPC error': '2024-02-26 16:10:41.978770'}> (decorators.py:145)
[2024-02-26 16:10:41,979 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.835168737+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>, [requestId: 4f3e6496-d4c1-11ee-b173-22f6251fc128] (api_request.py:57)
[2024-02-26 16:10:41,979 - ERROR - fouram]: [CheckFunc] hybrid_search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-26T16:10:41.835168737+00:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
> (func_check.py:48)
[2024-02-26 16:10:44,094 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-26 16:10:44,094 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 16:10:44,094 - INFO - fouram]: grpc hybrid_search 6173 5(0.08%) | 24319 1926 120217 20000 | 0.20 0.00 (stats.py:789)
[2024-02-26 16:10:44,094 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 16:10:44,094 - INFO - fouram]: Aggregated 6173 5(0.08%) | 24319 1926 120217 20000 | 0.20 0.00 (stats.py:789)
[2024-02-26 16:10:44,094 - INFO - fouram]: (stats.py:790)
[2024-02-26 16:10:44,095 - INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2024-02-26 16:10:44,095 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2024-02-26 16:10:44,095 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
Expected Behavior
No response
Steps To Reproduce
concurrent test and calculation of RT and QPS
:purpose: `shard_num=1, replica=3, binary_vector & float_vector DQL`
verify concurrent DQL scenario which has 4 binary_vector fields(BIN_IVF_FLAT, BIN_FLAT) and 60 scalar fields
:test steps:
1. create collection with fields:
'float_vector': 32768dim,
'binary_vector_1': 32768dim,
'float_vector_2': 32768dim,
'binary_vector_3': 32768dim,
all scalar fields: varchar max_length=10, array max_capacity=7
2. build indexes:
HNSW: 'float_vector'
BIN_IVF_FLAT: 'binary_vector_1',
DISKANN: 'float_vector_2'
BIN_FLAT: 'binary_vector_3'
default_scalar_index: 'int64_1'
INVERTED: 'id', 'bool_3'
3. insert 100k data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 3
7. concurrent request:
- hybrid_search
Milvus Log
No response
Anything else?
test result:
[2024-02-26 19:04:30,997 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2024-02-26 19:04:30,997 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 19:04:30,997 - INFO - fouram]: grpc hybrid_search 21000 26000 30000 33000 44000 55000 70000 79000 103000 120000 120000 14339 (stats.py:819)
[2024-02-26 19:04:30,997 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2024-02-26 19:04:30,997 - INFO - fouram]: Aggregated 21000 26000 30000 33000 44000 55000 70000 79000 103000 120000 120000 14339 (stats.py:819)
[2024-02-26 19:04:30,997 - INFO - fouram]: (stats.py:820)
[2024-02-26 19:04:39,983 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-02-26 19:04:39,983 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-02-26 19:04:39,983 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 19:04:39,983 - INFO - fouram]: grpc hybrid_search 14348 5(0.03%) | 24905 1864 120217 21000 | 0.80 0.00 (stats.py:789)
[2024-02-26 19:04:39,983 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-26 19:04:39,983 - INFO - fouram]: Aggregated 14348 5(0.03%) | 24905 1864 120217 21000 | 0.80 0.00 (stats.py:789)
[2024-02-26 19:04:39,983 - INFO - fouram]: (stats.py:790)
[2024-02-26 19:04:39,988 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_8c16m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}},
'replicas': 4},
'indexNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'dataNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}}},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'master-20240226-befe0e21-amd64'}}},
'host': 'multi-vector-corn-mfmgj-11-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_hybrid_search_locust_shard1_dql_indexes_replicas_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 32768,
'max_length': 10,
'scalars_index': {'int64_1': {},
'id': {'index_type': 'INVERTED'},
'bool_3': {'index_type': 'INVERTED'}},
'vectors_index': {'binary_vector_1': {'index_type': 'BIN_IVF_FLAT',
'index_param': {'nlist': 2048},
'metric_type': 'JACCARD'},
'float_vector_2': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'},
'binary_vector_3': {'index_type': 'BIN_FLAT',
'index_param': {'nlist': 2048},
'metric_type': 'JACCARD'}},
'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
'array_int16_1': {'params': {'max_capacity': 7}},
'array_int32_1': {'params': {'max_capacity': 7}},
'array_int64_1': {'params': {'max_capacity': 7}},
'array_double_1': {'params': {'max_capacity': 7}},
'array_float_1': {'params': {'max_capacity': 7}},
'array_varchar_1': {'params': {'max_capacity': 7}},
'array_bool_1': {'params': {'max_capacity': 7}},
'array_int8_2': {'params': {'max_capacity': 7}},
'array_int16_2': {'params': {'max_capacity': 7}},
'array_int32_2': {'params': {'max_capacity': 7}},
'array_int64_2': {'params': {'max_capacity': 7}},
'array_double_2': {'params': {'max_capacity': 7}},
'array_float_2': {'params': {'max_capacity': 7}},
'array_varchar_2': {'params': {'max_capacity': 7}},
'array_bool_2': {'params': {'max_capacity': 7}},
'array_int8_3': {'params': {'max_capacity': 7}},
'array_int16_3': {'params': {'max_capacity': 7}},
'array_int32_3': {'params': {'max_capacity': 7}},
'array_int64_3': {'params': {'max_capacity': 7}},
'array_double_3': {'params': {'max_capacity': 7}},
'array_float_3': {'params': {'max_capacity': 7}},
'array_varchar_3': {'params': {'max_capacity': 7}},
'array_bool_3': {'params': {'max_capacity': 7}}},
'dataset_name': 'local',
'dataset_size': 100000,
'ni_per': 100},
'collection_params': {'other_fields': ['binary_vector_1',
'float_vector_2',
'binary_vector_3',
'int8_1',
'int16_1',
'int32_1',
'int64_1',
'double_1',
'float_1',
'varchar_1',
'bool_1',
'json_1',
'array_int8_1',
'array_int16_1',
'array_int32_1',
'array_int64_1',
'array_double_1',
'array_float_1',
'array_varchar_1',
'array_bool_1',
'int8_2',
'int16_2',
'int32_2',
'int64_2',
'double_2',
'float_2',
'varchar_2',
'bool_2',
'json_2',
'array_int8_2',
'array_int16_2',
'array_int32_2',
'array_int64_2',
'array_double_2',
'array_float_2',
'array_varchar_2',
'array_bool_2',
'int8_3',
'int16_3',
'int32_3',
'int64_3',
'double_3',
'float_3',
'varchar_3',
'bool_3',
'json_3',
'array_int8_3',
'array_int16_3',
'array_int32_3',
'array_int64_3',
'array_double_3',
'array_float_3',
'array_varchar_3',
'array_bool_3',
'varchar_tail_1',
'varchar_tail_2',
'varchar_tail_3',
'varchar_tail_4',
'varchar_tail_5',
'varchar_tail_6',
'varchar_tail_7',
'varchar_tail_8'],
'shards_num': 1},
'load_params': {'replica_number': 3},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '5h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'ef': 128},
'anns_field': 'float_vector',
'expr': '10000 '
'< '
'id '
'< '
'90000',
'top_k': 10},
{'search_param': {'nprobe': 64},
'anns_field': 'binary_vector_1',
'expr': 'int64_1 '
'> '
'10000',
'top_k': 50},
{'search_param': {'search_list': 1000},
'anns_field': 'float_vector_2',
'expr': 'array_length(array_int8_2) '
'== '
'7',
'top_k': 1000},
{'search_param': {'nprobe': 16},
'anns_field': 'binary_vector_3',
'expr': 'bool_3 '
'== '
'True',
'top_k': 3000}],
'rerank': {'WeightedRanker': [0.9,
0.9,
0.5,
0.5]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 120,
'random_data': True}}]},
'run_id': 2024022603559836,
'datetime': '2024-02-26 06:52:35.676013',
'client_version': '2.4.0'},
'result': {'test_result': {'index': {'RT': 484.1296,
'binary_vector_1': {'RT': 327.7243},
'float_vector_2': {'RT': 257.0689},
'binary_vector_3': {'RT': 0.5384},
'int64_1': {'RT': 0.5416},
'id': {'RT': 0.5404},
'bool_3': {'RT': 0.5387}},
'insert': {'total_time': 1657.4622,
'VPS': 60.3332,
'batch_time': 1.6575,
'batch': 100},
'flush': {'RT': 2.7263},
'load': {'RT': 45.362},
'Locust': {'Aggregated': {'Requests': 14348,
'Fails': 5,
'RPS': 0.8,
'fail_s': 0.0,
'RT_max': 120217.14,
'RT_avg': 24905.74,
'TP50': 21000.0,
'TP99': 79000.0},
'hybrid_search': {'Requests': 14348,
'Fails': 5,
'RPS': 0.8,
'fail_s': 0.0,
'RT_max': 120217.14,
'RT_avg': 24905.74,
'TP50': 21000.0,
'TP99': 79000.0}}}}}
The vector dimension seemed to be too large and there are too many scalar fields. So there will be too many small segments and the vector dim is too large. So I guess even if you run single field search it is actually slow
Does 3 replicas all have traffic?
Does 3 replicas all have traffic?
From the monitoring point of view, all three replicas have traffic, and the traffic is relatively balanced.
The vector dimension seemed to be too large and there are too many scalar fields. So there will be too many small segments and the vector dim is too large. So I guess even if you run single field search it is actually slow
yes, this scenario is to verify the maximum value check of field number and dim, which will generate many small segments.
The problem now is that the request latency of segcore
is very low
, but the final request latency of hybrid_search
is very high
, so need to look at where most of the time is spent.
The vector dimension seemed to be too large and there are too many scalar fields. So there will be too many small segments and the vector dim is too large. So I guess even if you run single field search it is actually slow
Under 100k data, 30 segments are generated.
do this test in my local machine, can only load replica = 1 because of memory limit, and from this log,
single request cost:
all single request means a hybrid search cost:
it shows not very big gap between segcore and final request.
perf querynode when search, it shows the main cost in knowhere search
also do a same test in argo and replica = 3 as upper test case, https://argo-workflows.zilliz.cc/workflows/qa/fouramf-multi-vector-5c6mh?tab=workflow
segcore search cost:
it shows almost near 4 seconds
proxy lantecy:
it shows also about 4 seconds.
so no a big gap between segcore and final request.
compare the low cpu usage case and my normal test:
low cpu usage case:
my normal case:
the requery cost has big gap, it means the low cpu case cost a lot when requery of hybrid search process. maybe requery use chunk manager to retrieve output field, so it can explain why cpu usage is low because of IO lantency.
so need to check whether use chunk manager for the low cpu case
compare the low cpu usage case and my normal test: low cpu usage case:
my normal case:
the requery cost has big gap, it means the low cpu case cost a lot when requery of hybrid search process. maybe requery use chunk manager to retrieve output field, so it can explain why cpu usage is low because of IO lantency.
so need to check whether use chunk manager for the low cpu case
what field are we specified as output? This does make sense if we want to retrieve from chunkmanager and the topk and nq is huge, and we definitely has some room to improve.
Questions:
- how long does it take for requery? do we have any kinds of metrics?
- do we do segment pruning while requery?
- what is the network bandwitdh of current test? could this be a bandwidth limitatoon?
Different scenarios, same problem
argo task:fouramf-multi-vector-tdclm test case name:test_hybrid_search_locust_shard1_float_dql_diskann_standalone
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-multi-vr-tdclm-34-3023-etcd-0 1/1 Running 0 22h 10.104.29.90 4am-node35 <none> <none>
fouramf-multi-vr-tdclm-34-3023-milvus-standalone-6ffdb7d85ddjdr 1/1 Running 1 (22h ago) 22h 10.104.21.82 4am-node24 <none> <none>
fouramf-multi-vr-tdclm-34-3023-minio-6766f66649-snfxk 1/1 Running 0 22h 10.104.34.84 4am-node37 <none> <none>
From the monitoring point of view, the search tasks of milvus have been queued, the latency of segcore has become higher, but the CPU usage is not enough.
client pod name: fouramf-multi-vector-tdclm-1156163776
client log:
client.log
test steps:
concurrent test and calculation of RT and QPS
:purpose: `shard_num=1, float_vector DQL`
verify concurrent DQL scenario which has 4 float_vector fields(DISKANN) and 60 scalar fields
:test steps:
1. create collection with fields:
'float_vector': 2048dim,
'float_vector_1': 2048dim,
'float_vector_2': 2048dim,
'float_vector_3': 2048dim,
all scalar fields: varchar max_length=10, array max_capacity=7
2. build indexes:
DISKANN: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
default_scalar_index: 'int64_1'
INVERTED: 'id', 'bool_3'
3. insert 100k data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- hybrid_search
test result:
[2024-03-20 22:36:17,825 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-20 22:36:17,826 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-03-20 22:36:17,826 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 22:36:17,826 - INFO - fouram]: grpc hybrid_search 1886 1(0.05%) | 37864 20599 60002 37000 | 0.52 0.00 (stats.py:789)
[2024-03-20 22:36:17,826 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-20 22:36:17,826 - INFO - fouram]: Aggregated 1886 1(0.05%) | 37864 20599 60002 37000 | 0.52 0.00 (stats.py:789)
[2024-03-20 22:36:17,826 - INFO - fouram]: (stats.py:790)
[2024-03-20 22:36:17,829 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_16c64m',
'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
'memory': '64Gi'},
'requests': {'cpu': '9.0',
'memory': '33Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240320-19d649f0-amd64'}}},
'host': 'fouramf-multi-vr-tdclm-34-3023-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_hybrid_search_locust_shard1_float_dql_diskann_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 2048,
'max_length': 10,
'scalars_index': {'int64_1': {},
'id': {'index_type': 'INVERTED'},
'bool_3': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'L2'}},
'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
'array_int16_1': {'params': {'max_capacity': 7}},
'array_int32_1': {'params': {'max_capacity': 7}},
'array_int64_1': {'params': {'max_capacity': 7}},
'array_double_1': {'params': {'max_capacity': 7}},
'array_float_1': {'params': {'max_capacity': 7}},
'array_varchar_1': {'params': {'max_capacity': 7}},
'array_bool_1': {'params': {'max_capacity': 7}},
'array_int8_2': {'params': {'max_capacity': 7}},
'array_int16_2': {'params': {'max_capacity': 7}},
'array_int32_2': {'params': {'max_capacity': 7}},
'array_int64_2': {'params': {'max_capacity': 7}},
'array_double_2': {'params': {'max_capacity': 7}},
'array_float_2': {'params': {'max_capacity': 7}},
'array_varchar_2': {'params': {'max_capacity': 7}},
'array_bool_2': {'params': {'max_capacity': 7}},
'array_int8_3': {'params': {'max_capacity': 7}},
'array_int16_3': {'params': {'max_capacity': 7}},
'array_int32_3': {'params': {'max_capacity': 7}},
'array_int64_3': {'params': {'max_capacity': 7}},
'array_double_3': {'params': {'max_capacity': 7}},
'array_float_3': {'params': {'max_capacity': 7}},
'array_varchar_3': {'params': {'max_capacity': 7}},
'array_bool_3': {'params': {'max_capacity': 7}}},
'dataset_name': 'local',
'dataset_size': 1500000,
'ni_per': 100},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int8_1',
'int16_1',
'int32_1',
'int64_1',
'double_1',
'float_1',
'varchar_1',
'bool_1',
'json_1',
'array_int8_1',
'array_int16_1',
'array_int32_1',
'array_int64_1',
'array_double_1',
'array_float_1',
'array_varchar_1',
'array_bool_1',
'int8_2',
'int16_2',
'int32_2',
'int64_2',
'double_2',
'float_2',
'varchar_2',
'bool_2',
'json_2',
'array_int8_2',
'array_int16_2',
'array_int32_2',
'array_int64_2',
'array_double_2',
'array_float_2',
'array_varchar_2',
'array_bool_2',
'int8_3',
'int16_3',
'int32_3',
'int64_3',
'double_3',
'float_3',
'varchar_3',
'bool_3',
'json_3',
'array_int8_3',
'array_int16_3',
'array_int32_3',
'array_int64_3',
'array_double_3',
'array_float_3',
'array_varchar_3',
'array_bool_3',
'varchar_tail_1',
'varchar_tail_2',
'varchar_tail_3',
'varchar_tail_4',
'varchar_tail_5',
'varchar_tail_6',
'varchar_tail_7',
'varchar_tail_8'],
'shards_num': 1},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'DISKANN',
'index_param': {}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '1h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'hybrid_search',
'weight': 1,
'params': {'nq': 1,
'top_k': 100,
'reqs': [{'search_param': {'search_list': 30},
'anns_field': 'float_vector',
'expr': 'id '
'> '
'150000',
'top_k': 10},
{'search_param': {'search_list': 100},
'anns_field': 'float_vector_1',
'expr': 'int64_1 '
'<= '
'1350000',
'top_k': 50},
{'search_param': {'search_list': 1500},
'anns_field': 'float_vector_2',
'expr': 'array_length(array_int8_2) '
'== '
'7',
'top_k': 1000},
{'search_param': {'search_list': 20000},
'anns_field': 'float_vector_3',
'expr': 'bool_3 '
'== '
'True',
'top_k': 16384}],
'rerank': {'RRFRanker': []},
'output_fields': ['float_vector'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 60,
'random_data': True}}]},
'run_id': 2024032084617105,
'datetime': '2024-03-20 04:21:01.809449',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 14677.199,
'float_vector_1': {'RT': 16331.9596},
'float_vector_2': {'RT': 19644.1707},
'float_vector_3': {'RT': 644.6092},
'int64_1': {'RT': 1.0166},
'id': {'RT': 1.0174},
'bool_3': {'RT': 1.016}},
'insert': {'total_time': 4951.575,
'VPS': 302.9339,
'batch_time': 0.3301,
'batch': 100},
'flush': {'RT': 3.5238},
'load': {'RT': 173.0584},
'Locust': {'Aggregated': {'Requests': 1886,
'Fails': 1,
'RPS': 0.52,
'fail_s': 0.0,
'RT_max': 60002.48,
'RT_avg': 37864.57,
'TP50': 37000.0,
'TP99': 51000.0},
'hybrid_search': {'Requests': 1886,
'Fails': 1,
'RPS': 0.52,
'fail_s': 0.0,
'RT_max': 60002.48,
'RT_avg': 37864.57,
'TP50': 37000.0,
'TP99': 51000.0}}}}}
@zhagnlu
this issue has two problems, low cpu usage and high latency. actually, it has two different situations considering querynode cpu pressure
first one: both low cpu usage for concurrency = 1 and concurrency=20
this situation usually happens querynode has more cpu resource than needed, although concurrency=20.
querynode has no task in queue, task can be done quickly, and cpu usage is variable, usually, more cpu usage more concurrency, but not not full usage just as show:
under same concurrency, cpu usage may different for different querynode, cpu usage is related to request that sended to this querynode, more request, more cpu usage, as two picture shows below:
second one: high search latency for concurrency =20
this situation usually happens when querynode has less cpu usage than needed. querynode can not process tasks in time.
as figure below shows, 16cores, and almost 16 tasks concurrency. every task has segments size subtasks.
if collections has many segments, because execute pool only have corenum * 2 threads. segment task need to wait available threads. so segment latency is multi bigger than segcore latency. as figure below shows:
As result, the E2E latency may become very bigger although segcore latency differ a little.
if it is a 16core machine, none of the pod reach usage more than 60%, does that make sense? I would expect at least one mode has more than 80% cpu? could it be due to other. bottlenecks like lock?
if it is a 16core machine, none of the pod reach usage more than 60%, does that make sense? I would expect at least one mode has more than 80% cpu? could it be due to other. bottlenecks like lock?
if client not give enough pressure, it is nomal.
If client pressure is enough, I test this case again, and can not reproduce this low cpu usage, it is almost 16 cores.
https://grafana-4am.zilliz.cc/d/uLf5cJ3Ga/milvus2-0?orgId=1&var-datasource=P1809F7CD0C75ACF3&var-cluster=&var-namespace=qa-milvus&var-instance=fouramf-multi-vr-8xpng-100-4981&var-collection=All&var-app_name=milvus&from=1711587644382&to=1711677452270
for this less than 80% situations, I guess maybe diskann problem, other cases indicate diskann index may blocked by IO, memory and algorithm argument. and need manual modify to get a high cpu usage
@wangting0128 can you confirm this a diskann test? diskann bottle neck is not in cpu so I guess the low cpu usage is reasonabel
@wangting0128 can you confirm this a diskann test? diskann bottle neck is not in cpu so I guess the low cpu usage is reasonabel
yes. This is a unique problem of diskann. It has been verified on the HNSW index that it can fill up the CPU.
I've confirmed with colleagues at knowhere that the IOPS in the diskann scenario is very high, so the bottleneck is not in the disk IOPS.
The disk speed is too fast, causing the CPU usage to not increase.
The IO performance of the test machine is 23x better than AWS's 1cu. And default global search thread pool size of knowhere is (4 * cpu). If IOPS is too low, the overhead of context switching of four threads on one core is negligible and can improve IO concurrency. If IOPS is too high, IO requests return quickly, and the thread context switching overhead cannot be ignored.
The IO performance of the test machine is 23x better than AWS's 1cu. And default global search thread pool size of knowhere is (4 * cpu). If IOPS is too low, the overhead of context switching of four threads on one core is negligible and can improve IO concurrency. If IOPS is too high, IO requests return quickly, and the thread context switching overhead cannot be ignored.
That's why I guess aio or io_uring still gonna to be important. Because it's always hard to manage thread numbers in BIO mode
@wangting0128 I saw the field float_vector
in the search request but not in the vectors_index
in both of the "test results" sections, is it intended?
@wangting0128 I saw the field
float_vector
in the search request but not in thevectors_index
in both of the "test results" sections, is it intended?
Are you asking about the index time is created on the float_vector filed?
float_vector field: 'index': {'RT': 14677.199,
@wangting0128 I saw the field
float_vector
in the search request but not in thevectors_index
in both of the "test results" sections, is it intended?Are you asking about the index time is created on the float_vector filed?
float_vector field:
'index': {'RT': 14677.199,
Not really. I mean why there is no index on the float_vector
field
But has search requests on this field, will it be just brute-force search or it has default index type?
@wangting0128 I saw the field
float_vector
in the search request but not in thevectors_index
in both of the "test results" sections, is it intended?Are you asking about the index time is created on the float_vector filed? float_vector field:
'index': {'RT': 14677.199,
Not really. I mean why there is no index on the
float_vector
fieldBut has search requests on this field, will it be just brute-force search or it has default index type?
float_vector index is as follow:
'index_params': {'index_type': 'DISKANN',
'index_param': {}},
@wangting0128 can you confirm this a diskann test? diskann bottle neck is not in cpu so I guess the low cpu usage is reasonabel
yes. This is a unique problem of diskann. It has been verified on the HNSW index that it can fill up the CPU.
I've confirmed with colleagues at knowhere that the IOPS in the diskann scenario is very high, so the bottleneck is not in the disk IOPS. The disk speed is too fast, causing the CPU usage to not increase.
Looking at the date&time of CPU Usage at the top and the Pod IO at the bottom of the dashboard. The IOPS barely changes whenever the concurrency. Guessing it already reached the max IOPS at that time on that instance, even when the concurrent number is 1 since each task is heavy. The I/O queued up causes the Search Segment Latency to be multi-10 seconds and around 40 segments are causing Proxy Search Latency to be around 1min.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.