amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

Change cgroup driver to systemd

Open reegnz opened this issue 4 years ago • 20 comments

Kubernetes documentation indicates that for stability reasons one should run kubernetes with the systemd cgroup driver if the init system itself is systemd.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

reegnz avatar Jan 06 '21 18:01 reegnz

https://github.com/awslabs/amazon-eks-ami/pull/587 reverted this change earlier, now that eksctl was made compatible with this change in https://github.com/weaveworks/eksctl/pull/2962 it should be OK to merge again.

reegnz avatar Jan 06 '21 18:01 reegnz

This is another run at https://github.com/awslabs/amazon-eks-ami/issues/490

reegnz avatar Jan 08 '21 10:01 reegnz

Hi, now that eksctl has released a change, we're looking into how to make this release possible. There's still a condition where if we release an AMI with this change and a user is using an older version of eksctl, this could still lead to failures. We will be update this issue with more details, once we have a plan to pick this change.

abeer91 avatar Jan 11 '21 19:01 abeer91

@abeer91 I do understand the concern of breaking downstream tooling, so I might look into how that could be handled, although I can't promise anything (thinking about maybe using a path unit to keep the kubelet config in sync with the daemon.json but that is very hacky).

How long do you think we should wait for the newer eksctl to be adopted? Does it warrant a changelog note that this might break some tooling downstream (tools that assume only kubelet config needs to be changed)?

reegnz avatar Jan 12 '21 11:01 reegnz

Hi, upon cluster creation with GPU instances i could not fetch nodes with kubectl get nodes. I see the below error on kubelet logs

failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs" eksctl version : 0.51.0 eks version: 1.20 kubectl version: v1.21.0 AMI: ami-0b450512a684db582

Will the above fix solves this issue. @abeer91 when can we expect this fix.

jagadeeshi2i avatar May 31 '21 07:05 jagadeeshi2i

@reegnz you might want to add systemd cgroup support to containerd in this PR now it's been released.

@abeer91 if this is still being blocked due to concerns about eksctl how about a containerd only PR (see https://github.com/aws/containers-roadmap/issues/1210#issuecomment-888176554)?

stevehipwell avatar Jul 28 '21 09:07 stevehipwell

note: systemd cgroup support to containerd is not correct. We are in runc v2 not in v1

@reegnz you might want to add systemd cgroup support to containerd in this PR now it's been released.

@abeer91 if this is still being blocked due to concerns about eksctl how about a containerd only PR (see aws/containers-roadmap#1210 (comment))?

josephprem avatar Jul 28 '21 11:07 josephprem

@josephprem ~~I can see that runc cgroup v2 is supported but is it actually enabled in the AMI?~~

I can't find any official documentation with the best being reading https://github.com/containerd/containerd/issues/4203 (specifically https://github.com/containerd/containerd/issues/4203#issuecomment-651532765)

stevehipwell avatar Jul 28 '21 12:07 stevehipwell

@josephprem ~I can see that runc cgroup v2 is supported but is it actually enabled in the AMI?~

I can't find any official documentation with the best being reading containerd/containerd#4203 (specifically containerd/containerd#4203 (comment))

yes it is

cat /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/dockershim.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

josephprem avatar Jul 28 '21 13:07 josephprem

@josephprem I re-read the config and that's why I struck the comment through about v2 being enabled. I'm still interested if you have any better documentation than the comment I found and linked above.

stevehipwell avatar Jul 28 '21 14:07 stevehipwell

@josephprem I re-read the config and that's why I struck the comment through about v2 being enabled. I'm still interested if you have any better documentation than the comment I found and linked above.

https://github.com/containerd/cri/blob/master/docs/config.md

josephprem avatar Jul 28 '21 15:07 josephprem

I have added this in my user-data

conf=/etc/eks/containerd/containerd-config.toml 
# EKS bootstrap script moves $conf to /etc/eks/containerd/config.toml 
section="\[plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes.runc.options\]"
key="SystemdCgroup"
grep -q $section $conf || sed -i "/^runtime_type.*/a $section" $conf
grep -q $key $conf || sed -i "/$section.*/a $key = true" $conf

Followed by enabling --cgroup-driver=systemd in kubelet ( note the service file patched )

sed -i 's/KUBELET_EXTRA_ARGS/KUBELET_EXTRA_ARGS $EXTENDED_KUBELET_ARGS/' /etc/eks/containerd/kubelet-containerd.service
cat << EOF > /etc/systemd/system/kubelet.service.d/9999-extended-kubelet-args.conf
[Service]
Environment='EXTENDED_KUBELET_ARGS=--cgroup-driver=systemd'
EOF
systemctl daemon-reload

josephprem avatar Jul 28 '21 18:07 josephprem

@josephprem I assume that you could modify /etc/kubernetes/kubelet/kubelet-config.json like the changes in this PR instead of setting --cgroup-driver=systemd?

stevehipwell avatar Jul 29 '21 07:07 stevehipwell

/etc/kubernetes/kubelet/kubelet-config.json

I guess so , but my preference for pushing variables is through systemd Drop-Ins

josephprem avatar Jul 29 '21 08:07 josephprem

Hey folks 👋 eksctl here. A couple of things to note on our side:

  • We have changed the way that eksctl bootstraps: we no longer roll our own, and simply call the /etc/eks/bootstrap.sh to delegate to whatever is going on in the AMIs. Therefore eksctl will not need to do anything with cgroup stuff. (See details of this decision here https://github.com/weaveworks/eksctl/pull/3564)
  • HOWEVER: The above switch comes with a breaking change for those using Unmanaged Nodegroups with Custom AMIs, therefore the legacy codepath is still in place for a little while longer. (See notice here https://github.com/weaveworks/eksctl/issues/3563.) On the legacy path we have removed previously added changes which set the cgroup driver. This was because we kept experiencing breakages every time the AMI folks moved something around (this pain was actually one of the reasons why we removed our own bootstrapper). This could mean that if the custom AMI is based on one which sets the docker driver to be systemd, the kubelet will fail because eksctl's legacy bootstrapper will override the kubelet config, setting the driver to cgroupfs.
    • BUT, users can get by this by using the overrideBootstrapCommand to set their own userdata.

Please note that we are no longer making changes to the legacy path.

To address this point:

There's still a condition where if we release an AMI with this change and a user is using an older version of eksctl, this could still lead to failures.

We emphatically do not support older versions of eksctl. If users are running any version of eksctl which is not latest, they are doing it wrong 🙃 .

Callisto13 avatar Aug 02 '21 09:08 Callisto13

We emphatically do not support older versions of eksctl. If users are running any version of eksctl which is not latest, they are doing it wrong 🙃 .

@Callisto13 thanks for emphasizing that!

@abeer91 so now that being said, what is still blocking the merging of this PR?

reegnz avatar Aug 03 '21 09:08 reegnz

@reegnz I think what @Callisto13 was saying is that the latest version of eksctl still supports a legacy flow for compatibility and that is what's currently blocking this? This is just an observation; personally I don't think a higher level project should dictate the direction of a dependency, this is why strong APIs/contracts are essential.

stevehipwell avatar Aug 03 '21 09:08 stevehipwell

@stevehipwell Legacy flow seems to be supported for Custom AMIs, not the official EKS AMI, for the official EKS AMI they use the /etc/eks/bootstrap.sh, no legacy code-path there. We're talking about enabling systemd cgroup driver for the official AMI, not custom AMI-s. The custom AMI problem has been handled on their side.

reegnz avatar Aug 03 '21 09:08 reegnz

@reegnz I think the concern was that someone using legacy flow with a custom AMI built from this AMI would be broken after this PR was merged?

stevehipwell avatar Aug 03 '21 10:08 stevehipwell

I want to verify that there are no outstanding concerns/issues, and I'll try to get this merged this week.

mmerkes avatar Nov 02 '21 20:11 mmerkes

I'm going to close this; we've made systemd the cgroup driver for containerd, and we don't plan to touch docker until we remove it in 1.25.

cartermckinnon avatar Oct 12 '22 09:10 cartermckinnon