milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster][LRU] search and query failed in dml & dql scene
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: milvus-io-lru-dev-9234a94-20240506
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: lru-fouramf-d5jpz
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lru-verify-32135-cluster-etcd-0 1/1 Running 0 5m 10.104.19.173 4am-node28 <none> <none>
lru-verify-32135-cluster-etcd-1 1/1 Running 0 5m 10.104.15.190 4am-node20 <none> <none>
lru-verify-32135-cluster-etcd-2 1/1 Running 0 5m 10.104.18.232 4am-node25 <none> <none>
lru-verify-32135-cluster-milvus-datacoord-5789ccf96f-4xk7b 1/1 Running 0 5m1s 10.104.24.132 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-datanode-7447559b8b-2zgc9 1/1 Running 0 5m1s 10.104.24.135 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-indexcoord-5d69fb9db8-2hgfl 1/1 Running 0 5m1s 10.104.24.136 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-5xb82 1/1 Running 0 5m1s 10.104.30.107 4am-node38 <none> <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-9tbh9 1/1 Running 0 5m1s 10.104.17.37 4am-node23 <none> <none>
lru-verify-32135-cluster-milvus-proxy-767bf99dd-7flpv 1/1 Running 0 5m1s 10.104.24.131 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-querycoord-8477cbc647-jvmrt 1/1 Running 0 5m1s 10.104.24.134 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-querynode-856df586fb-tf77v 1/1 Running 0 5m1s 10.104.1.169 4am-node10 <none> <none>
lru-verify-32135-cluster-milvus-rootcoord-7c78f55b49-f9qcv 1/1 Running 0 5m1s 10.104.24.130 4am-node29 <none> <none>
lru-verify-32135-cluster-minio-0 1/1 Running 0 5m 10.104.19.172 4am-node28 <none> <none>
lru-verify-32135-cluster-minio-1 1/1 Running 0 5m 10.104.15.191 4am-node20 <none> <none>
lru-verify-32135-cluster-minio-2 1/1 Running 0 5m 10.104.24.142 4am-node29 <none> <none>
lru-verify-32135-cluster-minio-3 1/1 Running 0 5m 10.104.18.233 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-0 1/1 Running 0 5m 10.104.18.230 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-1 1/1 Running 0 5m 10.104.19.174 4am-node28 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-2 1/1 Running 0 5m 10.104.15.194 4am-node20 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-init-49mgf 0/1 Completed 0 5m1s 10.104.24.133 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-broker-0 1/1 Running 0 5m1s 10.104.6.6 4am-node13 <none> <none>
lru-verify-32135-cluster-pulsar-proxy-0 1/1 Running 0 5m1s 10.104.19.167 4am-node28 <none> <none>
lru-verify-32135-cluster-pulsar-pulsar-init-njn79 0/1 Completed 0 5m1s 10.104.24.140 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-recovery-0 1/1 Running 0 5m1s 10.104.5.168 4am-node12 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-0 1/1 Running 0 5m1s 10.104.18.229 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-1 1/1 Running 0 4m22s 10.104.24.144 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-2 1/1 Running 0 3m46s 10.104.15.196 4am-node20 <none> <none> (base.py:257)
[2024-05-06 18:22:08,058 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'NAME|lru-verify-32135-cluster-milvus|lru-verify-32135-cluster-minio|lru-verify-32135-cluster-etcd|lru-verify-32135-cluster-pulsar|lru-verify-32135-cluster-zookeeper|lru-verify-32135-cluster-kafka|lru-verify-32135-cluster-log|lru-verify-32135-cluster-tikv' (util_cmd.py:14)
[2024-05-06 18:22:19,029 - INFO - fouram]: [CliClient] pod details of release(lru-verify-32135-cluster):
I0506 18:22:09.704805 517 request.go:665] Waited for 1.19818961s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/batch/v1beta1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lru-verify-32135-cluster-etcd-0 1/1 Running 0 12h 10.104.19.173 4am-node28 <none> <none>
lru-verify-32135-cluster-etcd-1 1/1 Running 0 12h 10.104.15.190 4am-node20 <none> <none>
lru-verify-32135-cluster-etcd-2 1/1 Running 0 12h 10.104.18.232 4am-node25 <none> <none>
lru-verify-32135-cluster-milvus-datacoord-5789ccf96f-4xk7b 1/1 Running 0 12h 10.104.24.132 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-datanode-7447559b8b-2zgc9 1/1 Running 0 12h 10.104.24.135 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-indexcoord-5d69fb9db8-2hgfl 1/1 Running 0 12h 10.104.24.136 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-5xb82 1/1 Running 0 12h 10.104.30.107 4am-node38 <none> <none>
lru-verify-32135-cluster-milvus-indexnode-6647c68789-9tbh9 1/1 Running 0 12h 10.104.17.37 4am-node23 <none> <none>
lru-verify-32135-cluster-milvus-proxy-767bf99dd-7flpv 1/1 Running 0 12h 10.104.24.131 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-querycoord-8477cbc647-jvmrt 1/1 Running 0 12h 10.104.24.134 4am-node29 <none> <none>
lru-verify-32135-cluster-milvus-querynode-856df586fb-tf77v 1/1 Running 20 (25m ago) 12h 10.104.1.169 4am-node10 <none> <none>
lru-verify-32135-cluster-milvus-rootcoord-7c78f55b49-f9qcv 1/1 Running 0 12h 10.104.24.130 4am-node29 <none> <none>
lru-verify-32135-cluster-minio-0 1/1 Running 0 12h 10.104.19.172 4am-node28 <none> <none>
lru-verify-32135-cluster-minio-1 1/1 Running 0 12h 10.104.15.191 4am-node20 <none> <none>
lru-verify-32135-cluster-minio-2 1/1 Running 0 12h 10.104.24.142 4am-node29 <none> <none>
lru-verify-32135-cluster-minio-3 1/1 Running 0 12h 10.104.18.233 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-0 1/1 Running 0 12h 10.104.18.230 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-1 1/1 Running 0 12h 10.104.19.174 4am-node28 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-2 1/1 Running 0 12h 10.104.15.194 4am-node20 <none> <none>
lru-verify-32135-cluster-pulsar-bookie-init-49mgf 0/1 Completed 0 12h 10.104.24.133 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-broker-0 1/1 Running 0 12h 10.104.6.6 4am-node13 <none> <none>
lru-verify-32135-cluster-pulsar-proxy-0 1/1 Running 0 12h 10.104.19.167 4am-node28 <none> <none>
lru-verify-32135-cluster-pulsar-pulsar-init-njn79 0/1 Completed 0 12h 10.104.24.140 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-recovery-0 1/1 Running 0 12h 10.104.5.168 4am-node12 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-0 1/1 Running 0 12h 10.104.18.229 4am-node25 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-1 1/1 Running 0 12h 10.104.24.144 4am-node29 <none> <none>
lru-verify-32135-cluster-pulsar-zookeeper-2 1/1 Running 0 12h 10.104.15.196 4am-node20 <none> <none>
client pod name: lru-fouramf-d5jpz-3796633683 client log:
[2024-05-06 13:05:44,194 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=65535, message=fail to Query on QueryNode 21: worker(21) query failed: Assert "is_system_field_ready()" at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:1030
[2024-05-06 13:06:02,560 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=65535, message=fail to search on QueryNode 21: worker(21) query failed: => failed to load row ID or timestamp, potential missing bin logs or empty segments. Segment ID = 449570705034082594)>, <Time:{'RPC start': '2024-05-06 13:06:01.943796', 'RPC error': '2024-05-06 13:06:02.560093'}> (decorators.py:146)
test result:
[2024-05-06 18:20:48,147 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: grpc delete 562232 279(0.05%) | 40 1 5274 16 | 13.01 0.01 (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: grpc insert 561829 281(0.05%) | 219 23 12583 180 | 13.01 0.01 (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: grpc load 561432 0(0.00%) | 49 4 6101 25 | 13.00 0.00 (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: grpc query 560719 57974(10.34%) | 230 1 162124 110 | 12.98 1.34 (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: grpc search 561121 57551(10.26%) | 218 5 162326 91 | 12.99 1.33 (stats.py:789)
[2024-05-06 18:20:48,147 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-05-06 18:20:48,148 - INFO - fouram]: Aggregated 2807333 116085(4.14%) | 151 1 162326 82 | 64.98 2.69 (stats.py:789)
[2024-05-06 18:20:48,148 - INFO - fouram]: (stats.py:790)
[2024-05-06 18:20:48,150 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_8c16m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '2',
'memory': '8Gi',
'ephemeral-storage': '70Gi'},
'requests': {'cpu': '2',
'memory': '8Gi'}},
'replicas': 1,
'extraEnv': [{'name': 'LOCAL_STORAGE_SIZE',
'value': '70'}]},
'indexNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '8Gi'},
'requests': {'cpu': '5.0',
'memory': '5Gi'}},
'replicas': 2},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '8Gi'},
'requests': {'cpu': '2.0',
'memory': '8Gi'}},
'replicas': 1},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}},
'persistence': {'size': '320Gi'}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'extraConfigFiles': {'user.yaml': 'queryNode:\n'
' '
'diskCacheCapacityLimit: '
'51539607552\n'
' mmap:\n'
' mmapEnabled: '
'true\n'
' lazyloadEnabled: '
'true\n'
' '
'useStreamComputing: '
'true\n'
' cache:\n'
' warmup: sync\n'
' '
'lazyloadWaitTimeout: '
'300000\n'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': 'milvus-io-lru-dev-9234a94-20240506'}}},
'host': 'lru-verify-32135-cluster-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_custom_parameters',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'column_name': 'float32_vector',
'dim': 768,
'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
'dataset_name': 'laion1b_nolang',
'dataset_size': '10w',
'ni_per': 10000},
'collection_params': {'other_fields': ['int64_1'],
'num_partitions': 64},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 30,
'efConstruction': 360}},
'concurrent_params': {'concurrent_number': 10,
'during_time': '12h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 64,
'timeout': 3000,
'random_vector': True}},
{'type': 'delete',
'weight': 1,
'params': {'delete_length': 64,
'timeout': 3000}},
{'type': 'flush',
'weight': 0,
'params': {'timeout': 3000}},
{'type': 'load',
'weight': 1,
'params': {'timeout': 3000}},
{'type': 'search',
'weight': 1,
'params': {'top_k': 1,
'nq': 10,
'search_param': {'ef': 64},
'expr': 'int64_1 '
'>= '
'0 '
'&& '
'int64_1 '
'<= '
'4',
'timeout': 3000,
'random_data': True}},
{'type': 'query',
'weight': 1,
'params': {'expr': 'int64_1 '
'!= '
'2',
'timeout': 3000,
'offset': 0,
'limit': 10}}]},
'run_id': 2024050660597437,
'datetime': '2024-05-06 06:14:19.839669',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 35.7288},
'insert': {'total_time': 25.8054,
'VPS': 3875.1579,
'batch_time': 2.5805,
'batch': 10000},
'flush': {'RT': 5.1085},
'load': {'RT': 4.6199},
'Locust': {'Aggregated': {'Requests': 2807333,
'Fails': 116085,
'RPS': 64.98,
'fail_s': 0.04,
'RT_max': 162326.53,
'RT_avg': 151.49,
'TP50': 82,
'TP99': 800.0},
'delete': {'Requests': 562232,
'Fails': 279,
'RPS': 13.01,
'fail_s': 0.0,
'RT_max': 5274.16,
'RT_avg': 40.11,
'TP50': 16,
'TP99': 370.0},
'insert': {'Requests': 561829,
'Fails': 281,
'RPS': 13.01,
'fail_s': 0.0,
'RT_max': 12583.25,
'RT_avg': 219.05,
'TP50': 180.0,
'TP99': 1100.0},
'load': {'Requests': 561432,
'Fails': 0,
'RPS': 13.0,
'fail_s': 0.0,
'RT_max': 6101.14,
'RT_avg': 49.95,
'TP50': 25,
'TP99': 220.0},
'query': {'Requests': 560719,
'Fails': 57974,
'RPS': 12.98,
'fail_s': 0.1,
'RT_max': 162124.56,
'RT_avg': 230.17,
'TP50': 110.0,
'TP99': 1100.0},
'search': {'Requests': 561121,
'Fails': 57551,
'RPS': 12.99,
'fail_s': 0.1,
'RT_max': 162326.53,
'RT_avg': 218.43,
'TP50': 91,
'TP99': 1100.0}}}}}
Expected Behavior
No response
Steps To Reproduce
1. create a collection with 3 fields: id(primaryKey, autoID), float_vector(768dim), int64_1(partitionKey=64)
2. build HNSW index
3. prepare 10w data
4. flush collection
5. build index again with the same params
6. load collection
7. concurrent requests:
- insert
- delete
- load
- search
- query
Milvus Log
No response
Anything else?
No response
@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。
@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。
可以的,但是我不知道目前 gpt-4 的模型的名字
另外模型那里输入框,并不是选择框,如果拿到了 gpt4 权限的话,可以自己输入模型
@lblblong 可以增加 gpt-4 的选项,但我们没法测试,暂时还在等待名单中。
可以的,但是我不知道目前 gpt-4 的模型的名字
另外模型那里输入框,并不是选择框,如果拿到了 gpt4 权限的话,可以自己输入模型
可以将下拉时显示的模型改成网络请求全部可支持模型,并将常用的3.5模型置顶并添加【推荐】字样的显示