milvus
milvus copied to clipboard
[Bug]: diskann index in milvus_standalone high memory usage
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.3.12
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.7
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 8 Core, 32GB
- GPU: -
- Others:
Current Behavior
I have built a collection (embedding with diskann index) on about 8M embeddings (1536 dim) data. When I try to insert more embeddings into this collection, the memory usage gradually goes to 100% (Insert batch size: 2000 embeddings). Before inserting operation, the memory usage of milvus is 84%. Besides, milvus service continues to consume close to 100% cpu usage.
Is it reasonable to have such a high memory usage rate in diskann index?
I would be very grateful if you could give any advice. Thank you very much.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
@0215Arthur 84% is a bit high for a standalone mode milvus, and I guess the indexing tasks were still running. please check the metric to confirm that. Also you could wait for the index tasks to completed
/assign @0215Arthur /unassign
Thank you very much. BTW, which metric can show the current status of the index tasks?
Thank you very much.
When I try to load the collection (diskann index), the error occurred:
RPC error: [get_loading_progress], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 62.74893283843994 MB, concurrency = 1, memUsage = 30914.96684074402 MB, predictMemUsage = 30978.478965759277 MB, totalMem = 31733.72265625 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-01 15:13:06.132079', 'RPC error': '2024-04-01 15:13:06.142252'}> RPC error: [wait_for_loading_collection], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 62.74893283843994 MB, concurrency = 1, memUsage = 30914.96684074402 MB, predictMemUsage = 30978.478965759277 MB, totalMem = 31733.72265625 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-01 14:59:42.488573', 'RPC error': '2024-04-01 15:13:06.142527'}> RPC error: [load_collection], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 62.74893283843994 MB, concurrency = 1, memUsage = 30914.96684074402 MB, predictMemUsage = 30978.478965759277 MB, totalMem = 31733.72265625 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-01 14:59:41.026619', 'RPC error': '2024-04-01 15:13:06.142668'}>
Is this error caused as the over-capacity of the milvus? (1536 dim 11M embedding)
Is there any way to solve it?
30914.96684074402 MB,
1536 dim with 11M, I guess this need 100GB + memory for perforamnce instance 32GB + memory for diskANN(that's only for search, index may take extra)
you might be able to put 8M data into 32GB querynode but 11M I guess no
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.