cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
Unable to create cluster using amazon-2 ami
/kind bug
What steps did you take and what happened:
Create a cluster using capa-ami-amazon-2-v1.25.12 image.
Control-plane node won't be started and the following error will be in control-plane kubelet logs:
Aug 01 14:16:50 ip-10-189-0-251.eu-central-1.compute.internal kubelet[4449]: E0801 14:16:50.104552 4449 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/381d9c64be09430dd45c3ea4d33c6d7473d0704881ec3fc293d7e69fec81ac57/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown"
Aug 01 14:16:50 ip-10-189-0-251.eu-central-1.compute.internal kubelet[4449]: E0801 14:16:50.104604 4449 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/381d9c64be09430dd45c3ea4d33c6d7473d0704881ec3fc293d7e69fec81ac57/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown" pod="kube-system/etcd-ip-10-189-0-251.eu-central-1.compute.internal"
Aug 01 14:16:50 ip-10-189-0-251.eu-central-1.compute.internal kubelet[4449]: E0801 14:16:50.104631 4449 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/381d9c64be09430dd45c3ea4d33c6d7473d0704881ec3fc293d7e69fec81ac57/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown" pod="kube-system/etcd-ip-10-189-0-251.eu-central-1.compute.internal"
Aug 01 14:16:50 ip-10-189-0-251.eu-central-1.compute.internal kubelet[4449]: E0801 14:16:50.104708 4449 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-ip-10-189-0-251.eu-central-1.compute.internal_kube-system(a812507b09a2bba6c5690db77f322d9f)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-ip-10-189-0-251.eu-central-1.compute.internal_kube-system(a812507b09a2bba6c5690db77f322d9f)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/381d9c64be09430dd45c3ea4d33c6d7473d0704881ec3fc293d7e69fec81ac57/log.json: no such file or directory): runc did not terminate successfully: exit status 127: unknown\"" pod="kube-system/etcd-ip-10-189-0-251.eu-central-1.compute.internal" podUID=a812507b09a2bba6c5690db77f322d9f
If you will try to run runc binary, the following error will be returned:
root@ip-10-189-0-251 ~]# /usr/local/sbin/runc --help
/usr/local/sbin/runc: symbol lookup error: /usr/local/sbin/runc: undefined symbol: seccomp_notify_respond
This happens because CAPA is using cri-containerd-*.tar.gz archive to install containerd and runc. According to containerd release notes:
https://github.com/containerd/containerd/blob/40f26543bdc27cbe8b058ac082e91c5832bb1c41/releases/v1.6.0.toml#L64-L76
runc, included in containerd distribution is built with dynamic linking to libseccomp.
CAPA is using the following version of containerd:
[root@ip-10-189-0-251 ~]# /usr/local/bin/containerd --version
containerd github.com/containerd/containerd v1.6.21 3dce8eb055cbb6872793272b4f20ed16117344f8
which according to release notes includes runc v1.1.7.
runc v1.1.7 is linked to libseccomp-2.5.4, but installed version is
[root@ip-10-189-0-251 ~]# yum list installed | grep libsec
libseccomp.x86_64 2.4.1-1.amzn2 installed
which is the maximum libseccomp version available for epel7 repo.
What did you expect to happen: User should be able to create cluster using amazon linux 2 images.
Anything else you would like to add:
I was able to fix this issue in my image-builder fork by adding ansible steps to manually download statically-linked runc from https://github.com/opencontainers/runc/releases and replace runc installed by cri-containerd-*.tar.gz archive.
I can create a pull request in image-builder repo with the fix if you are ok with this approach.
Environment:
- Cluster-api-provider-aws version: registry.k8s.io/cluster-api-aws/cluster-api-aws-controller:v2.2.1
- Kubernetes version: (use
kubectl version): v1.25.12 - OS (e.g. from
/etc/os-release): amazon linux 2