milvus
milvus copied to clipboard
[Bug]: [benchmark] diskann inserted 100 million data, load failed, and reported "collection xxx has not been loaded to memory or load failed"
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20230506-7f5294b1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0.dev12
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
The memory resources are sufficient, but the load fails and "collection xxx has not been loaded to memory or load failed" is reported.
argo task : fouramf-shtt7-kg49g resource :
dataNode:
replicas: 1
resources:
requests:
memory: 64Gi
cpu: 8.0
limits:
memory: 64Gi
cpu: 8.0
indexNode:
replicas: 1
resources:
requests:
memory: 64Gi
cpu: 8.0
limits:
memory: 64Gi
cpu: 8.0
ephemeral-storage: 256Gi
queryNode:
replicas: 1
resources:
requests:
memory: 64Gi
cpu: 8.0
limits:
memory: 64Gi
cpu: 8.0
ephemeral-storage: 256Gi
server:
fouramf-shtt7-kg49g-97-6367-etcd-0 1/1 Running 0 4m18s 10.104.24.172 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-etcd-1 1/1 Running 0 4m18s 10.104.22.9 4am-node26 <none> <none>
fouramf-shtt7-kg49g-97-6367-etcd-2 1/1 Running 0 4m17s 10.104.6.153 4am-node13 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-datacoord-5bbcb78b65-kprns 1/1 Running 0 4m18s 10.104.24.167 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-datanode-55d494bd48-79tc2 1/1 Running 0 4m18s 10.104.23.92 4am-node27 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-indexcoord-67d56f59cd-fgmlg 1/1 Running 0 4m18s 10.104.19.67 4am-node28 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-indexnode-6cffd4d966-vq9jb 1/1 Running 0 4m18s 10.104.24.169 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-proxy-556c76586-qfd9h 1/1 Running 0 4m18s 10.104.24.170 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-querycoord-75497595df-7gvcl 1/1 Running 0 4m18s 10.104.24.168 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-querynode-7678576b76-xm7nz 1/1 Running 0 4m18s 10.104.19.66 4am-node28 <none> <none>
fouramf-shtt7-kg49g-97-6367-milvus-rootcoord-746c94497c-26wrl 1/1 Running 0 4m18s 10.104.19.68 4am-node28 <none> <none>
fouramf-shtt7-kg49g-97-6367-minio-0 1/1 Running 0 4m17s 10.104.6.152 4am-node13 <none> <none>
fouramf-shtt7-kg49g-97-6367-minio-1 1/1 Running 0 4m17s 10.104.22.12 4am-node26 <none> <none>
fouramf-shtt7-kg49g-97-6367-minio-2 1/1 Running 0 4m17s 10.104.5.179 4am-node12 <none> <none>
fouramf-shtt7-kg49g-97-6367-minio-3 1/1 Running 0 4m17s 10.104.20.185 4am-node22 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-bookie-0 1/1 Running 0 4m18s 10.104.24.174 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-bookie-1 1/1 Running 0 4m17s 10.104.22.14 4am-node26 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-bookie-2 1/1 Running 0 4m17s 10.104.6.159 4am-node13 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-bookie-init-s7gjq 0/1 Completed 0 4m18s 10.104.24.162 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-broker-0 1/1 Running 0 4m18s 10.104.24.160 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-proxy-0 1/1 Running 0 4m18s 10.104.19.69 4am-node28 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-pulsar-init-6ztch 0/1 Completed 0 4m18s 10.104.24.161 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-recovery-0 1/1 Running 0 4m18s 10.104.22.7 4am-node26 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-zookeeper-0 1/1 Running 0 4m18s 10.104.24.173 4am-node29 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-zookeeper-1 1/1 Running 0 3m32s 10.104.5.181 4am-node12 <none> <none>
fouramf-shtt7-kg49g-97-6367-pulsar-zookeeper-2 1/1 Running 0 2m57s 10.104.22.16 4am-node26 <none> <none>
client log:
[2023-05-08 05:24:05,050 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_xBvJKZYG): 99800000 (base.py:318)
[2023-05-08 05:24:05,239 - INFO - fouram]: [Base] Start inserting, ids: 99850000 - 99899999, data size: 100,000,000 (base.py:164)
[2023-05-08 05:24:07,001 - INFO - fouram]: [Time] Collection.insert run in 1.7616s (api_request.py:41)
[2023-05-08 05:24:07,004 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_xBvJKZYG): 99800000 (base.py:318)
[2023-05-08 05:24:07,942 - INFO - fouram]: [Base] Start inserting, ids: 99900000 - 99949999, data size: 100,000,000 (base.py:164)
[2023-05-08 05:24:09,767 - INFO - fouram]: [Time] Collection.insert run in 1.8244s (api_request.py:41)
[2023-05-08 05:24:09,770 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_xBvJKZYG): 99900000 (base.py:318)
[2023-05-08 05:24:09,937 - INFO - fouram]: [Base] Start inserting, ids: 99950000 - 99999999, data size: 100,000,000 (base.py:164)
[2023-05-08 05:24:11,600 - INFO - fouram]: [Time] Collection.insert run in 1.6626s (api_request.py:41)
[2023-05-08 05:24:11,604 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_xBvJKZYG): 99900000 (base.py:318)
[2023-05-08 05:24:11,691 - INFO - fouram]: [Base] Total time of insert: 3135.3359s, average number of vector bars inserted per second: 31894.5093, average time to insert 50000 vectors per time: 1.5677s (base.py:235)
[2023-05-08 05:24:11,692 - INFO - fouram]: [Base] Start flush collection fouram_xBvJKZYG (base.py:133)
[2023-05-08 05:24:14,714 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_xBvJKZYG): 100000000 (base.py:318)
[2023-05-08 05:24:14,719 - INFO - fouram]: [Base] Params of index: {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:296)
[2023-05-08 05:24:14,719 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_xBvJKZYG, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:283)
[2023-05-08 23:00:39,196 - INFO - fouram]: [Time] Index run in 63384.4759s (api_request.py:41)
[2023-05-08 23:00:39,197 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 63384.4759s (common_cases.py:87)
[2023-05-08 23:00:39,200 - INFO - fouram]: [Base] Params of index: {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:296)
[2023-05-08 23:00:39,200 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:90)
[2023-05-08 23:00:39,200 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:95)
[2023-05-08 23:00:39,200 - INFO - fouram]: [Base] Start load collection fouram_xBvJKZYG,replica_number:1,kwargs:{} (base.py:139)
[2023-05-08 23:10:41,078 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=collection 441324366066876535 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-05-08 23:10:41.076732', 'RPC error': '2023-05-08 23:10:41.078363'}> (decorators.py:108)
[2023-05-08 23:10:41,080 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=collection 441324366066876535 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-05-08 23:00:39.217361', 'RPC error': '2023-05-08 23:10:41.080019'}> (decorators.py:108)
[2023-05-08 23:10:41,080 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=collection 441324366066876535 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-05-08 23:00:39.200687', 'RPC error': '2023-05-08 23:10:41.080147'}> (decorators.py:108)
[2023-05-08 23:10:41,082 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=collection 441324366066876535 has not been loaded to memory or load failed)> (api_request.py:49)
[2023-05-08 23:10:41,082 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=collection 441324366066876535 has not been loaded to memory or load failed)> (func_check.py:52)
memory usage:
Expected Behavior
No response
Steps To Reproduce
1. create a collection or use an existing collection
2. build index on vector column
3. insert a certain number of vectors
4. flush collection
5. build index on vector column with the same parameters
6. build index on on scalars column or not
7. count the total number of rows
8. load collection ==> failed
9. perform concurrent operations
10. clean all collections or not
Milvus Log
No response
Anything else?
No response
/assign @xige-16 /unassign
/assign @yah01 /unassign
updage: image: master-20230530-b09e7aea insert 100m data load successfully
@elstic if it does not reproduce recently, please help to close it.
This issue did not appear recently
The issue arises again image: master-20230504-e172f3e8
server:
fouram-93-8048-etcd-0 1/1 Running 0 9m21s 10.104.1.34 4am-node10 <none> <none>
fouram-93-8048-etcd-1 1/1 Running 0 9m21s 10.104.16.91 4am-node21 <none> <none>
fouram-93-8048-etcd-2 1/1 Running 0 9m21s 10.104.21.178 4am-node24 <none> <none>
fouram-93-8048-milvus-datacoord-7595dbc5c4-kp887 1/1 Running 1 (5m21s ago) 9m21s 10.104.5.96 4am-node12 <none> <none>
fouram-93-8048-milvus-datanode-74b6bff58b-wxn5p 1/1 Running 2 (110s ago) 9m21s 10.104.16.71 4am-node21 <none> <none>
fouram-93-8048-milvus-indexcoord-6d5c55c6b5-zgf2p 1/1 Running 0 9m21s 10.104.21.170 4am-node24 <none> <none>
fouram-93-8048-milvus-indexnode-dc9dc4b9d-tmrjj 1/1 Running 1 (5m21s ago) 9m21s 10.104.5.95 4am-node12 <none> <none>
fouram-93-8048-milvus-proxy-ffcdfffd9-8nw9l 1/1 Running 1 (5m21s ago) 9m21s 10.104.20.128 4am-node22 <none> <none>
fouram-93-8048-milvus-querycoord-55d96dfd6-drpnb 1/1 Running 1 (5m21s ago) 9m21s 10.104.6.173 4am-node13 <none> <none>
fouram-93-8048-milvus-querynode-78fb7c9876-n9gnt 1/1 Running 1 (5m21s ago) 9m21s 10.104.24.32 4am-node29 <none> <none>
fouram-93-8048-milvus-rootcoord-58bd4c8778-q625p 1/1 Running 2 (110s ago) 9m21s 10.104.1.11 4am-node10 <none> <none>
fouram-93-8048-minio-0 1/1 Running 0 9m21s 10.104.1.32 4am-node10 <none> <none>
fouram-93-8048-minio-1 1/1 Running 0 9m21s 10.104.16.89 4am-node21 <none> <none>
fouram-93-8048-minio-2 1/1 Running 0 9m21s 10.104.20.143 4am-node22 <none> <none>
fouram-93-8048-minio-3 1/1 Running 0 9m20s 10.104.9.213 4am-node14 <none> <none>
fouram-93-8048-pulsar-bookie-0 1/1 Running 0 9m21s 10.104.6.196 4am-node13 <none> <none>
fouram-93-8048-pulsar-bookie-1 1/1 Running 0 9m20s 10.104.20.144 4am-node22 <none> <none>
fouram-93-8048-pulsar-bookie-2 1/1 Running 0 9m20s 10.104.21.181 4am-node24 <none> <none>
fouram-93-8048-pulsar-bookie-init-9t8nt 0/1 Completed 0 9m21s 10.104.16.70 4am-node21 <none> <none>
fouram-93-8048-pulsar-broker-0 1/1 Running 0 9m21s 10.104.1.12 4am-node10 <none> <none>
fouram-93-8048-pulsar-proxy-0 1/1 Running 0 9m21s 10.104.23.91 4am-node27 <none> <none>
fouram-93-8048-pulsar-pulsar-init-lb7dz 0/1 Completed 0 9m21s 10.104.16.69 4am-node21 <none> <none>
fouram-93-8048-pulsar-recovery-0 1/1 Running 0 9m21s 10.104.6.174 4am-node13 <none> <none>
fouram-93-8048-pulsar-zookeeper-0 1/1 Running 0 9m21s 10.104.20.139 4am-node22 <none> <none>
fouram-93-8048-pulsar-zookeeper-1 1/1 Running 0 5m20s 10.104.15.120 4am-node20 <none> <none>
fouram-93-8048-pulsar-zookeeper-2 1/1 Running 0 3m51s 10.104.21.183 4am-node24 <none> <none>
server (after):
fouram-93-8048-etcd-0 1/1 Running 0 18h 10.104.1.34 4am-node10 <none> <none>
fouram-93-8048-etcd-1 1/1 Running 0 18h 10.104.16.91 4am-node21 <none> <none>
fouram-93-8048-etcd-2 1/1 Running 0 18h 10.104.21.178 4am-node24 <none> <none>
fouram-93-8048-milvus-datacoord-7595dbc5c4-kp887 1/1 Running 1 (18h ago) 18h 10.104.5.96 4am-node12 <none> <none>
fouram-93-8048-milvus-datanode-74b6bff58b-wxn5p 1/1 Running 2 (18h ago) 18h 10.104.16.71 4am-node21 <none> <none>
fouram-93-8048-milvus-indexcoord-6d5c55c6b5-zgf2p 1/1 Running 0 18h 10.104.21.170 4am-node24 <none> <none>
fouram-93-8048-milvus-indexnode-dc9dc4b9d-tmrjj 1/1 Running 1 (18h ago) 18h 10.104.5.95 4am-node12 <none> <none>
fouram-93-8048-milvus-proxy-ffcdfffd9-8nw9l 1/1 Running 1 (18h ago) 18h 10.104.20.128 4am-node22 <none> <none>
fouram-93-8048-milvus-querycoord-55d96dfd6-drpnb 1/1 Running 1 (18h ago) 18h 10.104.6.173 4am-node13 <none> <none>
fouram-93-8048-milvus-querynode-78fb7c9876-ggjl6 1/1 Running 0 5m16s 10.104.15.13 4am-node20 <none> <none>
fouram-93-8048-milvus-querynode-78fb7c9876-n9gnt 0/1 Error 1 18h 10.104.24.32 4am-node29 <none> <none>
fouram-93-8048-milvus-rootcoord-58bd4c8778-q625p 1/1 Running 2 (18h ago) 18h 10.104.1.11 4am-node10 <none> <none>
fouram-93-8048-minio-0 1/1 Running 0 18h 10.104.1.32 4am-node10 <none> <none>
fouram-93-8048-minio-1 1/1 Running 0 18h 10.104.16.89 4am-node21 <none> <none>
fouram-93-8048-minio-2 1/1 Running 0 18h 10.104.20.143 4am-node22 <none> <none>
fouram-93-8048-minio-3 1/1 Running 0 18h 10.104.9.213 4am-node14 <none> <none>
fouram-93-8048-pulsar-bookie-0 1/1 Running 0 18h 10.104.6.196 4am-node13 <none> <none>
fouram-93-8048-pulsar-bookie-1 1/1 Running 0 18h 10.104.20.144 4am-node22 <none> <none>
fouram-93-8048-pulsar-bookie-2 1/1 Running 0 18h 10.104.21.181 4am-node24 <none> <none>
fouram-93-8048-pulsar-bookie-init-9t8nt 0/1 Completed 0 18h 10.104.16.70 4am-node21 <none> <none>
fouram-93-8048-pulsar-broker-0 1/1 Running 0 18h 10.104.1.12 4am-node10 <none> <none>
fouram-93-8048-pulsar-proxy-0 1/1 Running 0 18h 10.104.23.91 4am-node27 <none> <none>
fouram-93-8048-pulsar-pulsar-init-lb7dz 0/1 Completed 0 18h 10.104.16.69 4am-node21 <none> <none>
fouram-93-8048-pulsar-recovery-0 1/1 Running 0 18h 10.104.6.174 4am-node13 <none> <none>
fouram-93-8048-pulsar-zookeeper-0 1/1 Running 0 18h 10.104.20.139 4am-node22 <none> <none>
fouram-93-8048-pulsar-zookeeper-1 1/1 Running 0 18h 10.104.15.120 4am-node20 <none> <none>
fouram-93-8048-pulsar-zookeeper-2 1/1 Running 0 18h 10.104.21.183 4am-node24 <none> <none>
client error log:
[2023-06-13 09:52:05,789 - INFO - fouram]: [Base] Start flush collection fouram_hWyIQzMw (base.py:277)
[2023-06-13 09:52:08,326 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-13 09:52:08,326 - INFO - fouram]: [Base] Start release collection fouram_hWyIQzMw (base.py:288)
[2023-06-13 09:52:08,328 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_hWyIQzMw, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-14 02:53:51,385 - INFO - fouram]: [Time] Index run in 61303.0546s (api_request.py:45)
[2023-06-14 02:53:51,385 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 61303.0546s (common_cases.py:96)
[2023-06-14 02:53:51,388 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-14 02:53:51,388 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-14 02:53:51,388 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-14 02:53:51,389 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_hWyIQzMw): 100000000 (base.py:468)
[2023-06-14 02:53:51,390 - INFO - fouram]: [Base] Start load collection fouram_hWyIQzMw,replica_number:1,kwargs:{} (base.py:283)
[2023-06-14 03:03:54,674 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 03:03:54.672870', 'RPC error': '2023-06-14 03:03:54.674862'}> (decorators.py:108)
[2023-06-14 03:03:54,676 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 02:53:51.410122', 'RPC error': '2023-06-14 03:03:54.676319'}> (decorators.py:108)
[2023-06-14 03:03:54,676 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 02:53:51.390216', 'RPC error': '2023-06-14 03:03:54.676433'}> (decorators.py:108)
[2023-06-14 03:03:54,677 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)> (api_request.py:53)
[2023-06-14 03:03:54,678 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)> (func_check.py:52)
image:master-20230614-35cb0b5b
[2023-06-14 08:44:14,237 - INFO - fouram]: [check_params] scene_concurrent_locust required params: {'dataset_params': {'metric_type': 'L2', 'dim': 128, 'dataset_name': 'sift', 'dataset_size': '1m', 'ni_per': 50000}, 'collection_params': {'other_fields': []}, 'load_params': {}, 'query_params': {}, 'search_params': {}, 'index_params': {'index_type': 'DISKANN', 'index_param': {}}, 'concurrent_params': {'concurrent_number': [1, 20], 'during_time': 3600, 'interval': 20}, 'concurrent_tasks': [{'type': 'search', 'weight': 1, 'params': {'nq': 1, 'top_k': 1, 'search_param': {'search_list': 30}, 'random_data': True}}]} (params_check.py:31)
The error report is different, please see if it is caused by the same reason
server:
I0614 08:59:37.102417 455 request.go:665] Waited for 1.167015255s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/argoproj.io/v1alpha1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
perf-single-16831400-4-39-6861-etcd-0 1/1 Running 0 27m 10.104.20.36 4am-node22 <none> <none>
perf-single-16831400-4-39-6861-milvus-standalone-654b9bf55q8s78 1/1 Running 0 27m 10.104.23.219 4am-node27 <none> <none>
perf-single-16831400-4-39-6861-minio-5f5cf8c85d-d2f8r 1/1 Running 0 27m 10.104.23.218 4am-node27 <none> <none> (cli_client.py:131)
log
[2023-06-14 08:49:34,413 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 274.4136s (common_cases.py:96)
[2023-06-14 08:49:34,414 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-14 08:49:34,414 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-14 08:49:34,415 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-14 08:49:34,416 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_F9X74bJw): 1000000 (base.py:468)
[2023-06-14 08:49:34,416 - INFO - fouram]: [Base] Start load collection fouram_F9X74bJw,replica_number:1,kwargs:{} (base.py:283)
[2023-06-14 08:59:35,630 - INFO - fouram]: [Time] Collection.load run in 601.2136s (api_request.py:45)
[2023-06-14 08:59:35,635 - INFO - fouram]: [Base] Describe resource group:__default_resource_group, ResourceGroupInfo:<name:__default_resource_group>,<capacity:1000000>,<num_available_node:1>,<num_loaded_replica:{'fouram_F9X74bJw':1}>,<num_outgoing_node:{}>,<num_incoming_node:{}> (base.py:642)
[2023-06-14 08:59:35,639 - ERROR - fouram]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get channels, collection not loaded: collection=442166710995255856: collection not found)>, <Time:{'RPC start': '2023-06-14 08:59:35.635739', 'RPC error': '2023-06-14 08:59:35.639028'}> (decorators.py:108)
[2023-06-14 08:59:35,640 - ERROR - fouram]: (api_response) : <MilvusException: (code=15, message=failed to get replica info, err=failed to get channels, collection not loaded: collection=442166710995255856: collection not found)> (api_request.py:53)
[2023-06-14 08:59:35,640 - ERROR - fouram]: [CheckFunc] get_replicas request check failed, response:<MilvusException: (code=15, message=failed to get replica info, err=failed to get channels, collection not loaded: collection=442166710995255856: collection not found)>
release_name_prefix perf-single-1686731400 deploy_config fouramf-server-standalone-8c16m-disk case_params fouramf-client-gist1m-concurrent-diskann
image:master-20230614-35cb0b5b
[2023-06-14 08:40:24,911 - INFO - fouram]: [check_params] scene_concurrent_locust required params: {'dataset_params': {'metric_type': 'L2', 'dim': 768, 'dataset_name': 'gist', 'dataset_size': 1000000, 'ni_per': 1000}, 'collection_params': {'other_fields': []}, 'load_params': {}, 'query_params': {}, 'search_params': {}, 'index_params': {'index_type': 'DISKANN', 'index_param': {}}, 'concurrent_params': {'concurrent_number': [1, 20], 'during_time': 3600, 'interval': 20}, 'concurrent_tasks': [{'type': 'search', 'weight': 1, 'params': {'nq': 1, 'top_k': 1, 'search_param': {'search_list': 30}, 'random_data': True}}]} (params_check.py:31)
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
perf-single-16831400-3-83-2286-etcd-0 1/1 Running 0 63m 10.104.16.166 4am-node21 <none> <none>
perf-single-16831400-3-83-2286-milvus-standalone-68fb4b9c6b9mtg 1/1 Running 0 63m 10.104.24.153 4am-node29 <none> <none>
perf-single-16831400-3-83-2286-minio-56fb848f49-lgq7c 1/1 Running 0 63m 10.104.4.253 4am-node11 <none> <none>
log:
[2023-06-14 08:45:23,667 - INFO - fouram]: [Base] Total time of insert: 224.2344s, average number of vector bars inserted per second: 4459.619, average time to insert 1000 vectors per time: 0.2242s (base.py:379)
[2023-06-14 08:45:23,667 - INFO - fouram]: [Base] Start flush collection fouram_da8NtObO (base.py:277)
[2023-06-14 08:45:27,336 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-14 08:45:27,336 - INFO - fouram]: [Base] Start release collection fouram_da8NtObO (base.py:288)
[2023-06-14 08:45:27,338 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_da8NtObO, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-14 09:25:21,209 - INFO - fouram]: [Time] Index run in 2393.8687s (api_request.py:45)
[2023-06-14 09:25:21,212 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 2393.8687s (common_cases.py:96)
[2023-06-14 09:25:21,215 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-14 09:25:21,215 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-14 09:25:21,215 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-14 09:25:21,217 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_da8NtObO): 1000000 (base.py:468)
[2023-06-14 09:25:21,217 - INFO - fouram]: [Base] Start load collection fouram_da8NtObO,replica_number:1,kwargs:{} (base.py:283)
[2023-06-14 09:35:24,250 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:35:24.248721', 'RPC error': '2023-06-14 09:35:24.250173'}> (decorators.py:108)
[2023-06-14 09:35:24,252 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:25:21.227046', 'RPC error': '2023-06-14 09:35:24.251987'}> (decorators.py:108)
[2023-06-14 09:35:24,252 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:25:21.217910', 'RPC error': '2023-06-14 09:35:24.252142'}> (decorators.py:108)
[2023-06-14 09:35:24,254 - ERROR - fouram]: (api_response) : <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)> (api_request.py:53)
[2023-06-14 09:35:24,254 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)> (func_check.py:52
The issue arises again image: master-20230504-e172f3e8
server:
fouram-93-8048-etcd-0 1/1 Running 0 9m21s 10.104.1.34 4am-node10 <none> <none> fouram-93-8048-etcd-1 1/1 Running 0 9m21s 10.104.16.91 4am-node21 <none> <none> fouram-93-8048-etcd-2 1/1 Running 0 9m21s 10.104.21.178 4am-node24 <none> <none> fouram-93-8048-milvus-datacoord-7595dbc5c4-kp887 1/1 Running 1 (5m21s ago) 9m21s 10.104.5.96 4am-node12 <none> <none> fouram-93-8048-milvus-datanode-74b6bff58b-wxn5p 1/1 Running 2 (110s ago) 9m21s 10.104.16.71 4am-node21 <none> <none> fouram-93-8048-milvus-indexcoord-6d5c55c6b5-zgf2p 1/1 Running 0 9m21s 10.104.21.170 4am-node24 <none> <none> fouram-93-8048-milvus-indexnode-dc9dc4b9d-tmrjj 1/1 Running 1 (5m21s ago) 9m21s 10.104.5.95 4am-node12 <none> <none> fouram-93-8048-milvus-proxy-ffcdfffd9-8nw9l 1/1 Running 1 (5m21s ago) 9m21s 10.104.20.128 4am-node22 <none> <none> fouram-93-8048-milvus-querycoord-55d96dfd6-drpnb 1/1 Running 1 (5m21s ago) 9m21s 10.104.6.173 4am-node13 <none> <none> fouram-93-8048-milvus-querynode-78fb7c9876-n9gnt 1/1 Running 1 (5m21s ago) 9m21s 10.104.24.32 4am-node29 <none> <none> fouram-93-8048-milvus-rootcoord-58bd4c8778-q625p 1/1 Running 2 (110s ago) 9m21s 10.104.1.11 4am-node10 <none> <none> fouram-93-8048-minio-0 1/1 Running 0 9m21s 10.104.1.32 4am-node10 <none> <none> fouram-93-8048-minio-1 1/1 Running 0 9m21s 10.104.16.89 4am-node21 <none> <none> fouram-93-8048-minio-2 1/1 Running 0 9m21s 10.104.20.143 4am-node22 <none> <none> fouram-93-8048-minio-3 1/1 Running 0 9m20s 10.104.9.213 4am-node14 <none> <none> fouram-93-8048-pulsar-bookie-0 1/1 Running 0 9m21s 10.104.6.196 4am-node13 <none> <none> fouram-93-8048-pulsar-bookie-1 1/1 Running 0 9m20s 10.104.20.144 4am-node22 <none> <none> fouram-93-8048-pulsar-bookie-2 1/1 Running 0 9m20s 10.104.21.181 4am-node24 <none> <none> fouram-93-8048-pulsar-bookie-init-9t8nt 0/1 Completed 0 9m21s 10.104.16.70 4am-node21 <none> <none> fouram-93-8048-pulsar-broker-0 1/1 Running 0 9m21s 10.104.1.12 4am-node10 <none> <none> fouram-93-8048-pulsar-proxy-0 1/1 Running 0 9m21s 10.104.23.91 4am-node27 <none> <none> fouram-93-8048-pulsar-pulsar-init-lb7dz 0/1 Completed 0 9m21s 10.104.16.69 4am-node21 <none> <none> fouram-93-8048-pulsar-recovery-0 1/1 Running 0 9m21s 10.104.6.174 4am-node13 <none> <none> fouram-93-8048-pulsar-zookeeper-0 1/1 Running 0 9m21s 10.104.20.139 4am-node22 <none> <none> fouram-93-8048-pulsar-zookeeper-1 1/1 Running 0 5m20s 10.104.15.120 4am-node20 <none> <none> fouram-93-8048-pulsar-zookeeper-2 1/1 Running 0 3m51s 10.104.21.183 4am-node24 <none> <none>
server (after):
fouram-93-8048-etcd-0 1/1 Running 0 18h 10.104.1.34 4am-node10 <none> <none> fouram-93-8048-etcd-1 1/1 Running 0 18h 10.104.16.91 4am-node21 <none> <none> fouram-93-8048-etcd-2 1/1 Running 0 18h 10.104.21.178 4am-node24 <none> <none> fouram-93-8048-milvus-datacoord-7595dbc5c4-kp887 1/1 Running 1 (18h ago) 18h 10.104.5.96 4am-node12 <none> <none> fouram-93-8048-milvus-datanode-74b6bff58b-wxn5p 1/1 Running 2 (18h ago) 18h 10.104.16.71 4am-node21 <none> <none> fouram-93-8048-milvus-indexcoord-6d5c55c6b5-zgf2p 1/1 Running 0 18h 10.104.21.170 4am-node24 <none> <none> fouram-93-8048-milvus-indexnode-dc9dc4b9d-tmrjj 1/1 Running 1 (18h ago) 18h 10.104.5.95 4am-node12 <none> <none> fouram-93-8048-milvus-proxy-ffcdfffd9-8nw9l 1/1 Running 1 (18h ago) 18h 10.104.20.128 4am-node22 <none> <none> fouram-93-8048-milvus-querycoord-55d96dfd6-drpnb 1/1 Running 1 (18h ago) 18h 10.104.6.173 4am-node13 <none> <none> fouram-93-8048-milvus-querynode-78fb7c9876-ggjl6 1/1 Running 0 5m16s 10.104.15.13 4am-node20 <none> <none> fouram-93-8048-milvus-querynode-78fb7c9876-n9gnt 0/1 Error 1 18h 10.104.24.32 4am-node29 <none> <none> fouram-93-8048-milvus-rootcoord-58bd4c8778-q625p 1/1 Running 2 (18h ago) 18h 10.104.1.11 4am-node10 <none> <none> fouram-93-8048-minio-0 1/1 Running 0 18h 10.104.1.32 4am-node10 <none> <none> fouram-93-8048-minio-1 1/1 Running 0 18h 10.104.16.89 4am-node21 <none> <none> fouram-93-8048-minio-2 1/1 Running 0 18h 10.104.20.143 4am-node22 <none> <none> fouram-93-8048-minio-3 1/1 Running 0 18h 10.104.9.213 4am-node14 <none> <none> fouram-93-8048-pulsar-bookie-0 1/1 Running 0 18h 10.104.6.196 4am-node13 <none> <none> fouram-93-8048-pulsar-bookie-1 1/1 Running 0 18h 10.104.20.144 4am-node22 <none> <none> fouram-93-8048-pulsar-bookie-2 1/1 Running 0 18h 10.104.21.181 4am-node24 <none> <none> fouram-93-8048-pulsar-bookie-init-9t8nt 0/1 Completed 0 18h 10.104.16.70 4am-node21 <none> <none> fouram-93-8048-pulsar-broker-0 1/1 Running 0 18h 10.104.1.12 4am-node10 <none> <none> fouram-93-8048-pulsar-proxy-0 1/1 Running 0 18h 10.104.23.91 4am-node27 <none> <none> fouram-93-8048-pulsar-pulsar-init-lb7dz 0/1 Completed 0 18h 10.104.16.69 4am-node21 <none> <none> fouram-93-8048-pulsar-recovery-0 1/1 Running 0 18h 10.104.6.174 4am-node13 <none> <none> fouram-93-8048-pulsar-zookeeper-0 1/1 Running 0 18h 10.104.20.139 4am-node22 <none> <none> fouram-93-8048-pulsar-zookeeper-1 1/1 Running 0 18h 10.104.15.120 4am-node20 <none> <none> fouram-93-8048-pulsar-zookeeper-2 1/1 Running 0 18h 10.104.21.183 4am-node24 <none> <none>
client error log:
[2023-06-13 09:52:05,789 - INFO - fouram]: [Base] Start flush collection fouram_hWyIQzMw (base.py:277) [2023-06-13 09:52:08,326 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441) [2023-06-13 09:52:08,326 - INFO - fouram]: [Base] Start release collection fouram_hWyIQzMw (base.py:288) [2023-06-13 09:52:08,328 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_hWyIQzMw, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427) [2023-06-14 02:53:51,385 - INFO - fouram]: [Time] Index run in 61303.0546s (api_request.py:45) [2023-06-14 02:53:51,385 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 61303.0546s (common_cases.py:96) [2023-06-14 02:53:51,388 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441) [2023-06-14 02:53:51,388 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99) [2023-06-14 02:53:51,388 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107) [2023-06-14 02:53:51,389 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_hWyIQzMw): 100000000 (base.py:468) [2023-06-14 02:53:51,390 - INFO - fouram]: [Base] Start load collection fouram_hWyIQzMw,replica_number:1,kwargs:{} (base.py:283) [2023-06-14 03:03:54,674 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 03:03:54.672870', 'RPC error': '2023-06-14 03:03:54.674862'}> (decorators.py:108) [2023-06-14 03:03:54,676 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 02:53:51.410122', 'RPC error': '2023-06-14 03:03:54.676319'}> (decorators.py:108) [2023-06-14 03:03:54,676 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-06-14 02:53:51.390216', 'RPC error': '2023-06-14 03:03:54.676433'}> (decorators.py:108) [2023-06-14 03:03:54,677 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)> (api_request.py:53) [2023-06-14 03:03:54,678 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=1, message=collection 442144025703350773 has not been loaded to memory or load failed)> (func_check.py:52)
No panic, the cluster rebooted due to disconnection to pulsar
release_name_prefix perf-single-1686731400 deploy_config fouramf-server-standalone-8c16m-disk case_params fouramf-client-gist1m-concurrent-diskann
image:master-20230614-35cb0b5b
[2023-06-14 08:40:24,911 - INFO - fouram]: [check_params] scene_concurrent_locust required params: {'dataset_params': {'metric_type': 'L2', 'dim': 768, 'dataset_name': 'gist', 'dataset_size': 1000000, 'ni_per': 1000}, 'collection_params': {'other_fields': []}, 'load_params': {}, 'query_params': {}, 'search_params': {}, 'index_params': {'index_type': 'DISKANN', 'index_param': {}}, 'concurrent_params': {'concurrent_number': [1, 20], 'during_time': 3600, 'interval': 20}, 'concurrent_tasks': [{'type': 'search', 'weight': 1, 'params': {'nq': 1, 'top_k': 1, 'search_param': {'search_list': 30}, 'random_data': True}}]} (params_check.py:31)
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES perf-single-16831400-3-83-2286-etcd-0 1/1 Running 0 63m 10.104.16.166 4am-node21 <none> <none> perf-single-16831400-3-83-2286-milvus-standalone-68fb4b9c6b9mtg 1/1 Running 0 63m 10.104.24.153 4am-node29 <none> <none> perf-single-16831400-3-83-2286-minio-56fb848f49-lgq7c 1/1 Running 0 63m 10.104.4.253 4am-node11 <none> <none>
log:
[2023-06-14 08:45:23,667 - INFO - fouram]: [Base] Total time of insert: 224.2344s, average number of vector bars inserted per second: 4459.619, average time to insert 1000 vectors per time: 0.2242s (base.py:379) [2023-06-14 08:45:23,667 - INFO - fouram]: [Base] Start flush collection fouram_da8NtObO (base.py:277) [2023-06-14 08:45:27,336 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441) [2023-06-14 08:45:27,336 - INFO - fouram]: [Base] Start release collection fouram_da8NtObO (base.py:288) [2023-06-14 08:45:27,338 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_da8NtObO, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427) [2023-06-14 09:25:21,209 - INFO - fouram]: [Time] Index run in 2393.8687s (api_request.py:45) [2023-06-14 09:25:21,212 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 2393.8687s (common_cases.py:96) [2023-06-14 09:25:21,215 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441) [2023-06-14 09:25:21,215 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99) [2023-06-14 09:25:21,215 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107) [2023-06-14 09:25:21,217 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_da8NtObO): 1000000 (base.py:468) [2023-06-14 09:25:21,217 - INFO - fouram]: [Base] Start load collection fouram_da8NtObO,replica_number:1,kwargs:{} (base.py:283) [2023-06-14 09:35:24,250 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:35:24.248721', 'RPC error': '2023-06-14 09:35:24.250173'}> (decorators.py:108) [2023-06-14 09:35:24,252 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:25:21.227046', 'RPC error': '2023-06-14 09:35:24.251987'}> (decorators.py:108) [2023-06-14 09:35:24,252 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)>, <Time:{'RPC start': '2023-06-14 09:25:21.217910', 'RPC error': '2023-06-14 09:35:24.252142'}> (decorators.py:108) [2023-06-14 09:35:24,254 - ERROR - fouram]: (api_response) : <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)> (api_request.py:53) [2023-06-14 09:35:24,254 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_da8NtObO)> (func_check.py:52
This is not the same issue, it run out of the memory, maybe #24374 would help
[UnexpectedError] Assert "row_count > 0" errors reported
/assign @cqy123456 Knowhere behavior changed, @cqy123456 is fixing this
https://github.com/milvus-io/milvus/pull/24898
/assign @elstic fixed with #24898
/assign @elstic fixed with #24898
load 100 million data failures.
image: master-20230620-247f1170 case : test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster
server:
fouramf-x8wsv-92-7100-etcd-0 1/1 Running 0 18h 10.104.23.251 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-etcd-1 1/1 Running 0 18h 10.104.17.29 4am-node23 <none> <none>
fouramf-x8wsv-92-7100-etcd-2 1/1 Running 0 18h 10.104.21.134 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-milvus-datacoord-8f5f9bcf5-bvhj7 1/1 Running 0 18h 10.104.21.127 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-milvus-datanode-5c758bb86c-7qq5v 1/1 Running 0 18h 10.104.23.243 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-milvus-indexcoord-8656d87f6b-8ndmc 1/1 Running 0 18h 10.104.15.131 4am-node20 <none> <none>
fouramf-x8wsv-92-7100-milvus-indexnode-b58f5dd77-m7f7j 1/1 Running 0 18h 10.104.15.132 4am-node20 <none> <none>
fouramf-x8wsv-92-7100-milvus-proxy-7868b75c64-7btb2 1/1 Running 0 18h 10.104.20.67 4am-node22 <none> <none>
fouramf-x8wsv-92-7100-milvus-querycoord-8b4796845-rch85 1/1 Running 0 18h 10.104.21.126 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-milvus-querynode-86b56d4999-nwjqb 1/1 Running 0 10m 10.104.1.96 4am-node10 <none> <none>
fouramf-x8wsv-92-7100-milvus-querynode-86b56d4999-swbvs 0/1 Completed 0 18h 10.104.20.68 4am-node22 <none> <none>
fouramf-x8wsv-92-7100-milvus-rootcoord-85ffcddfd7-qnnzd 1/1 Running 0 18h 10.104.20.66 4am-node22 <none> <none>
fouramf-x8wsv-92-7100-minio-0 1/1 Running 0 18h 10.104.23.247 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-minio-1 1/1 Running 0 18h 10.104.17.30 4am-node23 <none> <none>
fouramf-x8wsv-92-7100-minio-2 1/1 Running 0 18h 10.104.21.133 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-minio-3 1/1 Running 0 18h 10.104.20.70 4am-node22 <none> <none>
fouramf-x8wsv-92-7100-pulsar-bookie-0 1/1 Running 0 18h 10.104.21.129 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-pulsar-bookie-1 1/1 Running 0 18h 10.104.23.252 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-pulsar-bookie-2 1/1 Running 0 18h 10.104.17.31 4am-node23 <none> <none>
fouramf-x8wsv-92-7100-pulsar-bookie-init-94m89 0/1 Completed 0 18h 10.104.21.125 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-pulsar-broker-0 1/1 Running 0 18h 10.104.23.244 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-pulsar-proxy-0 1/1 Running 0 18h 10.104.15.133 4am-node20 <none> <none>
fouramf-x8wsv-92-7100-pulsar-pulsar-init-5vbmm 0/1 Completed 0 18h 10.104.23.245 4am-node27 <none> <none>
fouramf-x8wsv-92-7100-pulsar-recovery-0 1/1 Running 0 18h 10.104.15.134 4am-node20 <none> <none>
fouramf-x8wsv-92-7100-pulsar-zookeeper-0 1/1 Running 0 18h 10.104.21.130 4am-node24 <none> <none>
fouramf-x8wsv-92-7100-pulsar-zookeeper-1 1/1 Running 0 18h 10.104.16.108 4am-node21 <none> <none>
fouramf-x8wsv-92-7100-pulsar-zookeeper-2 1/1 Running 0 18h 10.104.23.254 4am-node27 <none> <none>
client error log:
[2023-06-20 07:43:06,497 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_6QRP8asl): 99900000 (base.py:468)
[2023-06-20 07:43:06,539 - INFO - fouram]: [Base] Total time of insert: 2554.1683s, average number of vector bars inserted per second: 39151.6879, average time to insert 50000 vectors per time: 1.2771s (base.py:379)
[2023-06-20 07:43:06,540 - INFO - fouram]: [Base] Start flush collection fouram_6QRP8asl (base.py:277)
[2023-06-20 07:43:09,565 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-20 07:43:09,565 - INFO - fouram]: [Base] Start release collection fouram_6QRP8asl (base.py:288)
[2023-06-20 07:43:09,567 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_6QRP8asl, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:427)
[2023-06-21 00:53:02,185 - INFO - fouram]: [Time] Index run in 61792.6171s (api_request.py:45)
[2023-06-21 00:53:02,186 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 61792.6171s (common_cases.py:96)
[2023-06-21 00:53:02,188 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:441)
[2023-06-21 00:53:02,189 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-06-21 00:53:02,189 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-06-21 00:53:02,190 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_6QRP8asl): 100000000 (base.py:468)
[2023-06-21 00:53:02,190 - INFO - fouram]: [Base] Start load collection fouram_6QRP8asl,replica_number:1,kwargs:{} (base.py:283)
[2023-06-21 01:07:47,581 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_6QRP8asl)>, <Time:{'RPC start': '2023-06-21 01:07:47.578558', 'RPC error': '2023-06-21 01:07:47.581678'}> (decorators.py:108)
[2023-06-21 01:07:47,583 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_6QRP8asl)>, <Time:{'RPC start': '2023-06-21 00:53:02.215572', 'RPC error': '2023-06-21 01:07:47.583758'}> (decorators.py:108)
[2023-06-21 01:07:47,583 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_6QRP8asl)>, <Time:{'RPC start': '2023-06-21 00:53:02.191111', 'RPC error': '2023-06-21 01:07:47.583882'}> (decorators.py:108)
[2023-06-21 01:07:47,585 - ERROR - fouram]: (api_response) : <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_6QRP8asl)> (api_request.py:53)
[2023-06-21 01:07:47,585 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_6QRP8asl)> (func_check.py:52)
/assign @elstic Please try with #25469
/assign @elstic Please try with #25469
@yah01
diskann insert 100k data load failed. case: test_concurrent_locust_diskann_compaction_standalone image: master-20230711-70c4ddc6
client log:
[2023-07-11 20:07:16,394 - INFO - fouram]: [Base] Start inserting, ids: 50000 - 99999, data size: 100,000 (base.py:309)
[2023-07-11 20:07:17,990 - INFO - fouram]: [Time] Collection.insert run in 1.5948s (api_request.py:45)
[2023-07-11 20:07:17,992 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_5RfI3j1P): 0 (base.py:469)
[2023-07-11 20:07:18,083 - INFO - fouram]: [Base] Total time of insert: 3.2848s, average number of vector bars inserted per second: 30443.2538, average time to insert 50000 vectors per time: 1.6424s (base.py:380)
[2023-07-11 20:07:18,085 - INFO - fouram]: [Base] Start flush collection fouram_5RfI3j1P (base.py:278)
[2023-07-11 20:07:20,604 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:442)
[2023-07-11 20:07:20,604 - INFO - fouram]: [Base] Start release collection fouram_5RfI3j1P (base.py:289)
[2023-07-11 20:07:20,606 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_5RfI3j1P, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:428)
[2023-07-11 20:07:41,965 - INFO - fouram]: [Time] Index run in 21.358s (api_request.py:45)
[2023-07-11 20:07:41,965 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 21.358s (common_cases.py:96)
[2023-07-11 20:07:41,967 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:442)
[2023-07-11 20:07:41,967 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-07-11 20:07:41,967 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-07-11 20:07:41,968 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_5RfI3j1P): 100000 (base.py:469)
[2023-07-11 20:07:41,968 - INFO - fouram]: [Base] Start load collection fouram_5RfI3j1P,replica_number:1,kwargs:{} (base.py:284)
[2023-07-11 20:17:44,976 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_5RfI3j1P)>, <Time:{'RPC start': '2023-07-11 20:17:44.974639', 'RPC error': '2023-07-11 20:17:44.976785'}> (decorators.py:108)
[2023-07-11 20:17:44,978 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_5RfI3j1P)>, <Time:{'RPC start': '2023-07-11 20:07:41.980907', 'RPC error': '2023-07-11 20:17:44.978523'}> (decorators.py:108)
[2023-07-11 20:17:44,978 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_5RfI3j1P)>, <Time:{'RPC start': '2023-07-11 20:07:41.969109', 'RPC error': '2023-07-11 20:17:44.978801'}> (decorators.py:108)
[2023-07-11 20:17:44,980 - ERROR - fouram]: (api_response) : <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_5RfI3j1P)> (api_request.py:53)
[2023-07-11 20:17:44,980 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_5RfI3j1P)> (func_check.py:52)
server:
fouramf-stable-05600-4-1-6061-etcd-0 1/1 Running 0 16m 10.104.6.8 4am-node13 <none> <none>
fouramf-stable-05600-4-1-6061-milvus-standalone-79d74db499n8scp 1/1 Running 0 16m 10.104.19.216 4am-node28 <none> <none>
fouramf-stable-05600-4-1-6061-minio-9f8bb4794-nwqdz 1/1 Running 0 16m 10.104.4.21 4am-node11 <none> <none>
related https://github.com/milvus-io/knowhere/pull/991
After verification, inserting 100 million data, can load successfully. Verify image: master-20230719-e418ab2f
The diskann index to insert 100 million data loads failed. image: master-20230728-c2693ea2
argo task : fouramf-concurrent-jhgfh, id : 1 case: test_concurrent_locust_100m_diskann_ddl_dql_filter_cluster
server:
fouram-15-6355-etcd-0 1/1 Running 0 7h17m 10.104.14.212 4am-node18 <none> <none>
fouram-15-6355-etcd-1 1/1 Running 0 7h17m 10.104.23.187 4am-node27 <none> <none>
fouram-15-6355-etcd-2 1/1 Running 0 7h17m 10.104.21.117 4am-node24 <none> <none>
fouram-15-6355-milvus-datacoord-748cfb8b56-68prg 1/1 Running 0 7h17m 10.104.19.175 4am-node28 <none> <none>
fouram-15-6355-milvus-datanode-766b8767cf-sczqk 1/1 Running 0 7h17m 10.104.13.130 4am-node16 <none> <none>
fouram-15-6355-milvus-indexcoord-5d7f6bf49b-bfbcg 1/1 Running 0 7h17m 10.104.19.179 4am-node28 <none> <none>
fouram-15-6355-milvus-indexnode-778cb9b76c-cgmpx 1/1 Running 0 7h17m 10.104.19.180 4am-node28 <none> <none>
fouram-15-6355-milvus-proxy-688ddb867d-9vhvk 1/1 Running 0 7h17m 10.104.14.203 4am-node18 <none> <none>
fouram-15-6355-milvus-querycoord-7fbc57d7cf-76gqz 1/1 Running 0 7h17m 10.104.14.204 4am-node18 <none> <none>
fouram-15-6355-milvus-querynode-579f7bb7fc-xb6vd 1/1 Running 0 7h17m 10.104.14.205 4am-node18 <none> <none>
fouram-15-6355-milvus-rootcoord-6b9d4bdb8c-57tvc 1/1 Running 0 7h17m 10.104.19.176 4am-node28 <none> <none>
fouram-15-6355-minio-0 1/1 Running 0 7h17m 10.104.14.210 4am-node18 <none> <none>
fouram-15-6355-minio-1 1/1 Running 0 7h17m 10.104.23.180 4am-node27 <none> <none>
fouram-15-6355-minio-2 1/1 Running 0 7h17m 10.104.12.206 4am-node17 <none> <none>
fouram-15-6355-minio-3 1/1 Running 0 7h17m 10.104.18.204 4am-node25 <none> <none>
fouram-15-6355-pulsar-bookie-0 1/1 Running 0 7h17m 10.104.14.214 4am-node18 <none> <none>
fouram-15-6355-pulsar-bookie-1 1/1 Running 0 7h17m 10.104.23.188 4am-node27 <none> <none>
fouram-15-6355-pulsar-bookie-2 1/1 Running 0 7h17m 10.104.13.136 4am-node16 <none> <none>
fouram-15-6355-pulsar-bookie-init-d5jhs 0/1 Completed 0 7h17m 10.104.19.178 4am-node28 <none> <none>
fouram-15-6355-pulsar-broker-0 1/1 Running 0 7h17m 10.104.13.131 4am-node16 <none> <none>
fouram-15-6355-pulsar-proxy-0 1/1 Running 0 7h17m 10.104.13.132 4am-node16 <none> <none>
fouram-15-6355-pulsar-pulsar-init-crzst 0/1 Completed 0 7h17m 10.104.19.177 4am-node28 <none> <none>
fouram-15-6355-pulsar-recovery-0 1/1 Running 0 7h17m 10.104.19.181 4am-node28 <none> <none>
fouram-15-6355-pulsar-zookeeper-0 1/1 Running 0 7h17m 10.104.12.204 4am-node17 <none> <none>
fouram-15-6355-pulsar-zookeeper-1 1/1 Running 0 7h16m 10.104.21.119 4am-node24 <none> <none>
fouram-15-6355-pulsar-zookeeper-2 1/1 Running 0 7h15m 10.104.18.210 4am-node25 <none> <none>
client log:
[2023-07-28 12:41:47,080 - INFO - fouram]: [Base] Start inserting, ids: 99950000 - 99999999, data size: 100,000,000 (base.py:323)
[2023-07-28 12:41:49,792 - INFO - fouram]: [Time] Collection.insert run in 2.7113s (api_request.py:45)
[2023-07-28 12:41:49,795 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_MlEaez5h): 99900000 (base.py:483)
[2023-07-28 12:41:49,867 - INFO - fouram]: [Base] Total time of insert: 4070.0102s, average number of vector bars inserted per second: 24569.963, average time to insert 50000 vectors per time: 2.035s (base.py:394)
[2023-07-28 12:41:49,867 - INFO - fouram]: [Base] Start flush collection fouram_MlEaez5h (base.py:292)
[2023-07-28 12:41:51,389 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:456)
[2023-07-28 12:41:51,390 - INFO - fouram]: [Base] Start release collection fouram_MlEaez5h (base.py:303)
[2023-07-28 12:41:51,392 - INFO - fouram]: [Base] Start build index of DISKANN for collection fouram_MlEaez5h, params:{'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}} (base.py:442)
[2023-07-28 18:15:03,023 - INFO - fouram]: [Time] Index run in 19991.6305s (api_request.py:45)
[2023-07-28 18:15:03,023 - INFO - fouram]: [CommonCases] RT of build index DISKANN: 19991.6305s (common_cases.py:96)
[2023-07-28 18:15:03,025 - INFO - fouram]: [Base] Params of index: [{'float_vector': {'index_type': 'DISKANN', 'metric_type': 'L2', 'params': {}}}] (base.py:456)
[2023-07-28 18:15:03,026 - INFO - fouram]: [CommonCases] Prepare index DISKANN done. (common_cases.py:99)
[2023-07-28 18:15:03,026 - INFO - fouram]: [CommonCases] No scalars need to be indexed. (common_cases.py:107)
[2023-07-28 18:15:03,027 - INFO - fouram]: [Base] Number of vectors in the collection(fouram_MlEaez5h): 100000000 (base.py:483)
[2023-07-28 18:15:03,027 - INFO - fouram]: [Base] Start load collection fouram_MlEaez5h,replica_number:1,kwargs:{} (base.py:298)
[2023-07-28 18:25:03,577 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_MlEaez5h)>, <Time:{'RPC start': '2023-07-28 18:25:03.371498', 'RPC error': '2023-07-28 18:25:03.576887'}> (decorators.py:126)
[2023-07-28 18:25:03,578 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_MlEaez5h)>, <Time:{'RPC start': '2023-07-28 18:15:03.125622', 'RPC error': '2023-07-28 18:25:03.578006'}> (decorators.py:126)
[2023-07-28 18:25:03,578 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_MlEaez5h)>, <Time:{'RPC start': '2023-07-28 18:15:03.028096', 'RPC error': '2023-07-28 18:25:03.578186'}> (decorators.py:126)
[2023-07-28 18:25:03,579 - ERROR - fouram]: (api_response) : <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_MlEaez5h)> (api_request.py:53)
[2023-07-28 18:25:03,579 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: fouram_MlEaez5h)>
memory usage:
querynode error log:
There seems to be querynode oom:
Many small segments, maybe also related to #25928 for performance downgrade
@elstic tests this with 64GiB memory, and some problems found:
After step9, the predicted memory usage reduced and closed to the fact usage:
During step9, many small segments got loading, the DiskANN memory usage prediction is higher than the fact usage, we see the prediction is about 32GiB:
We need a way to control the concurrency level in QueryNode, the permitted load request got stuck while it can't request io pool, and it's memory usage still contributed to the memory usage predication
Working on this
/assign @elstic fixed by #26045
/assign @elstic fixed by #26045
issue fixed. Verify image: master-20230802-830f0678