microk8s
microk8s copied to clipboard
Kubelite not starting after power failure unless cgroups-per-qos=false
Summary
After the host had been shut down abruptly, microk8s (kubelite) would no longer start due to the following error:
Jan 24 19:02:35 bernd microk8s.daemon-kubelite[2373]: E0124 19:02:35.011772 2373 kubelet.go:1542] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"
After having applied the workaround mentioned by @neoaggelos in https://github.com/canonical/microk8s/issues/4301#issuecomment-1810061954, microk8s started.
Now microk8s cannot start without those changes.
What Should Happen Instead?
Microk8s should start without having to disable cgroups per qos.
Reproduction Steps
None.
Introspection Report
During the last boot, before the power outage, the host had been running for a long time. microk8s had been updated from 1.26, through 1.27, 1.28 to 1.29 without a reboot. So the power cycle might just have exposed issues that would otherwise have shown.
I haven't found anything in the patch notes that suggest that there's some change in how cgroups works lately. The computer haven't been configured any different since it was working. So I'm unsure what would make cgroups misbehave (as suggested in #4301).
Hi @AlexGustafsson, thank you for raising this. This has been an issue we are seeing with MicroK8s 1.29 recently, see also #4361. I wonder if you are bumping into the same problem.