amazon-eks-ami
amazon-eks-ami copied to clipboard
Added runtime.slice for containerd runtime
Issue #, if available: #1050
Description of changes:
This PR adds the runtime.slice
and makes use of it when using containerd as the runtime as is recommended in the Kubernetes documentation.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Testing Done
See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.
IMPORTANT - This isn't currently working as expected when patching the changes onto a running node, but I'm unsure as to why. The containerd cgroup is always system.slice
no matter what I do.
@cartermckinnon could you take a look at this and see if there is anything obviously wrong with the way I've configured containerd? I've tried overwriting the main unit file but it still uses the wrong cgroup. I don't have the infrastructure setup to build and test this myself and it might be worth trying it that way?
I've tried overwriting the main unit file but it still uses the wrong cgroup
You mean the containerd
unit file, right? I modified the one on an AL2 instance (/usr/lib/systemd/system/containerd.service
) to use Slice=runtime.slice
and it looks correct to me:
> systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2022-10-18 20:45:21 UTC; 5s ago
Docs: https://containerd.io
Process: 1090 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 1122 (containerd)
Tasks: 15
Memory: 56.5M
CGroup: /runtime.slice/containerd.service
└─1122 /usr/bin/containerd
Can we add a dropin unit for containerd
to set this?
@cartermckinnon this PR has the dropin file 00-runtime-slice.conf to do this. I was testing on an active EKS node so that might have been why it wasn't working correctly. Do you have automation in place to test this PR as based on your testing it should work?
@cartermckinnon have you had a chance to look at this again? It'd be good to get it working correctly.
@cartermckinnon would it be possible for you to get this tested or do I need to setup some infrastructure to test it myself?
@cartermckinnon I've now tested this and and would like to get it merged and released ASAP, is there anything else you need from me?
CC @bwagner5
@stevehipwell sorry for the radio silence -- this looks fine to me. I want to read through the k8s doc on this again to make sure I'm understanding the recommendations, but this should get merged by the end of the week.
Thanks for tackling it!
I'm not clear on the benefit of this without enabling the optional enforcement of KubeReserved
/SystemReserved
, I would guess some metrics are more accurate/fine-grained?
@cartermckinnon this is the documented way to set up Kubernetes with Containerd as the CRI. So that is the main benefit along with observability improvements.
It then also means that end users can decide to enforce kube reserved and/or system reserved by adding additional kubelet arguments.
@cartermckinnon I've made the suggested change, this should be good now.
LGTM, thanks @stevehipwell 🚀
@stevehipwell sadly this PR does not seem to work because of a systemd issue. See https://github.com/awslabs/amazon-eks-ami/issues/1436#issuecomment-1729201589