bottlerocket
bottlerocket copied to clipboard
Failed to start ContainerManager err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"
Image I'm using: ami-0cfbf4d66ba90a43d
What I expected to happen: it is possible to set "vm.overcommit_memory" to "0" or limitation about that written in doc: https://bottlerocket.dev/en/os/1.19.x/api/settings/kernel/
What actually happened:
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.940330 3819 kubelet.go:2329] "Starting kubelet main sync loop"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: E0215 10:12:04.940491 3819 kubelet.go:2353] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958872 3819 cpu_manager.go:214] "Starting CPU manager" policy="none"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958901 3819 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.958951 3819 state_mem.go:36] "Initialized new in-memory state store"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959252 3819 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959357 3819 state_mem.go:96] "Updated CPUSet assignments" assignments={}
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.959373 3819 policy_none.go:49] "None policy: Start"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960282 3819 memory_manager.go:170] "Starting memorymanager" policy="None"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960353 3819 state_mem.go:35] "Initializing new in-memory state store"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: I0215 10:12:04.960715 3819 state_mem.go:75] "Updated machine memory state"
Feb 15 10:12:04 ip-100-65-10-211.eu-central-1.compute.internal kubelet[3819]: E0215 10:12:04.961639 3819 kubelet.go:1542] "Failed to start ContainerManager" err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"
Feb 15 10:12:05 ip-100-65-10-211.eu-central-1.compute.internal systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 15 10:12:05 ip-100-65-10-211.eu-central-1.compute.internal systemd[1]: kubelet.service: Failed with result 'exit-code'.
How to reproduce the problem:
update user-data with:
[settings.kernel.sysctl]
"vm.overcommit_memory" = "0"
Hi @yaroslav-nakonechnikov, thanks for letting us know about this. I did some digging, and found out that the kubelet requires vm.overcommit_memory to be set at 1. The kubelet does allow a more "permissive" mode in which it will warn when this setting, and a few more, don't have the required values. However, in Bottlerocket we configure the more restrictive approach, which will cause the kubelet to refuse to start when the values of the kernel tunables don't match the required values (see kubelet configs -> --protect-kernel-defaults). What's your use case to require vm.overcommit_memory=0?
@arnaldo2792 thank you!
we are using splunk, and sometimes it crashes. And as we want to find root cause, we read that: https://docs.splunk.com/Documentation/Splunk/9.2.0/ReleaseNotes/LinuxmemoryovercommittingandSplunkcrashes and wanted to try it.
yes, we understand that doc is more for on-premise setup, but before submitting case to support, we wanted to try set this setting. And found that it contradicts a bit with documentation