bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Failed to start ContainerManager err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"

Open yaroslav-nakonechnikov opened this issue 1 year ago • 2 comments

Image I'm using: ami-0cfbf4d66ba90a43d

What I expected to happen: it is possible to set "vm.overcommit_memory" to "0" or limitation about that written in doc: https://bottlerocket.dev/en/os/1.19.x/api/settings/kernel/

What actually happened:

Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.940330    3819 kubelet.go:2329] "Starting kubelet main sync loop"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: E0215 10:12:04.940491    3819 kubelet.go:2353] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958872    3819 cpu_manager.go:214] "Starting CPU manager" policy="none"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958901    3819 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958951    3819 state_mem.go:36] "Initialized new in-memory state store"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959252    3819 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959357    3819 state_mem.go:96] "Updated CPUSet assignments" assignments={}
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959373    3819 policy_none.go:49] "None policy: Start"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960282    3819 memory_manager.go:170] "Starting memorymanager" policy="None"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960353    3819 state_mem.go:35] "Initializing new in-memory state store"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960715    3819 state_mem.go:75] "Updated machine memory state"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: E0215 10:12:04.961639    3819 kubelet.go:1542] "Failed to start ContainerManager" err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"
Feb 15 10:12:05 ip-100-65-10-211.eu-central-1.compute.internal systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 15 10:12:05 ip-100-65-10-211.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed with result 'exit-code'.

How to reproduce the problem:

update user-data with:

[settings.kernel.sysctl]
"vm.overcommit_memory" = "0"

yaroslav-nakonechnikov avatar Feb 15 '24 10:02 yaroslav-nakonechnikov

Hi @yaroslav-nakonechnikov, thanks for letting us know about this. I did some digging, and found out that the kubelet requires vm.overcommit_memory to be set at 1. The kubelet does allow a more "permissive" mode in which it will warn when this setting, and a few more, don't have the required values. However, in Bottlerocket we configure the more restrictive approach, which will cause the kubelet to refuse to start when the values of the kernel tunables don't match the required values (see kubelet configs -> --protect-kernel-defaults). What's your use case to require vm.overcommit_memory=0?

arnaldo2792 avatar Feb 15 '24 20:02 arnaldo2792

@arnaldo2792 thank you!

we are using splunk, and sometimes it crashes. And as we want to find root cause, we read that: https://docs.splunk.com/Documentation/Splunk/9.2.0/ReleaseNotes/LinuxmemoryovercommittingandSplunkcrashes and wanted to try it.

yes, we understand that doc is more for on-premise setup, but before submitting case to support, we wanted to try set this setting. And found that it contradicts a bit with documentation

yaroslav-nakonechnikov avatar Feb 16 '24 07:02 yaroslav-nakonechnikov