milvus [Bug]: v2.4.0 datanode 内存使用过高

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: v2.4.0
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): 2.7
- OS(Ubuntu or CentOS):  CentOS
- CPU/Memory: 544c /4291.6 G at least
- GPU:  0 
- Others: datanode

Current Behavior

参考sizing tools 分配Data Node ， 2 core 8 GB x 2pods ，实际运行出现OOM ，扩容后内存占用达40G

Expected Behavior

参考sizing tools 分配Data Node ， 2 core 8 GB x 2pods ，实际运行出现OOM ，扩容后内存占用达40G

Steps To Reproduce

参考sizing tools 分配Data Node ， 2 core 8 GB x 2pods ， 实际运行出现OOM ， 扩容后内存占用达40G

Milvus Log

No response

Anything else?

No response

Apr 29 '24 05:04 yesyue

The title and description of this issue contains Chinese. Please use English to describe your issue.

Apr 29 '24 05:04 github-actions[bot]

Referring to the Sizing Tools, allocate Data Nodes with 2 cores of 8 GB x 2 pods. However, during actual operation, the Data Nodes was an OOM, and after expansion, the memory usage reached 40G.

Apr 29 '24 05:04 yesyue

datanode log:

datanode.log

Apr 29 '24 05:04 yesyue

@yesyue please share more info about how you using milvus, e.g. what kinds of requests did you call to milvus, how many, and how frequency of them? also please help all the milvus pods logs for invesgitaion.

/assign @yesyue /unassign

Apr 29 '24 06:04 yanliang567

100 Million/day entites write to milvus

Apr 29 '24 06:04 yesyue

100 Million/day entites write to milvus

after I inserted 10M entites total, then milvus docker stop and crash. I use IVF_SQ8 index, installed milvus with gpu. I use batch insert 10000 (only insert if enough 10000 entities.

after crash I can't connect to connection again and can't use anything. Any solution?

Apr 29 '24 06:04 tadinhkien99

seems that flush can not catch up the read.
how many partitions do you have? if you have many partitions or collections, the flush and memory consumption will be larger than estimation.
there is bunch of configs to tune, like concurrent flush number -> dataNode.dataSync.maxParallelSyncMgrTasks (for 2.4) memory used for growing segment

Apr 29 '24 14:04 xiaofan-luan

100 Million/day entites write to milvus

after I inserted 10M entites total, then milvus docker stop and crash. I use IVF_SQ8 index, installed milvus with gpu. I use batch insert 10000 (only insert if enough 10000 entities.

after crash I can't connect to connection again and can't use anything. Any solution?

how much gpu memory do you have? please open another issue with detailed logs so we can help

Apr 29 '24 14:04 xiaofan-luan

querynode (3).log

May 04 '24 02:05 yesyue

querynode (3).log

1.could you offer log for datanode? 2. it would be great if you have a datanode pprof, so you know which part takes of your memory. Most likely it's insert buffer takes the memory and you can tune the flush parameter

May 05 '24 14:05 xiaofan-luan

I saw you in many issues and we'd like to offer help. feel free to contact me at [email protected] if necessary

May 05 '24 14:05 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Jun 05 '24 01:06 stale[bot]

milvus milvus copied to clipboard

[Bug]: v2.4.0 datanode 内存使用过高

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard