Volcano Scheduler Memory leak
What happened:
Check the memory usage of volcano scheduler. It is consistent with the increase. After restarting, the memory plummets.
Monitor before restarting
Monitor after restart
Heap memory usage couple before restart
Heap memory usage couple after reboot
Hi, have you set memory limit of container resource limit and what's the log level you set?
@Monokaix Yes, I set the limit value of the container. I previously set the log level to 4. I also suspected that it was because the log level was set too high. I have now changed it to 0 for verification.
@Monokaix After modifying the log level, the memory is still growing. Log printing speed is not fast either
Can you wait memory hit the limit and check whether memory reclamation happened?
@Monokaix I'll give it a try. I found a problem. The scheduler has the nodeSelector option, but the pods do not consider filtering out the situations that are not on these nodes, causing the memory to maintain the pod information on all nodes in the entire cluster, resulting in a waste of memory.
@Monokaix I'll give it a try. I found a problem. The scheduler has the nodeSelector option, but the pods do not consider filtering out the situations that are not on these nodes, causing the memory to maintain the pod information on all nodes in the entire cluster, resulting in a waste of memory.
That's right, but I think it's not the main cause of high memory usage: ). Much log and delayed memory reclamation may cause this phenomenon and memory can be reclaimed when it about ro hit the memory limit.
@Monokaix Thank you very much. I will lower the limit and observe it for a while.
@Monokaix I reduced the memory limit, but oom did not trigger gc, and the memory continued to increase.
You can use kill -12 $volcano-scheduler pid to dump cache info. https://github.com/volcano-sh/volcano/pull/3088
@Monokaix Thank you very much. I will lower the limit and observe it for a while.
What's the resources num of you cluster? like nodes and pods, we should not lower the memory too low, and meet the basic memory needs first, then lower the memory limit a bit.
Can this https://github.com/volcano-sh/volcano/pull/3435 fixed your problem?