Incorrect iptables system detected (`nft` instead of `legacy`) on Amazon Linux 2
What happened:
Kind hangs when creating a cluster. More specifically, the kind-control-plane container gets stuck at the select_iptables() function, when executing the iptables-nft-save command of the entrypoint. The command hangs the startup process and Kind is unable to docker rm after marking the Starting control-plane step as failed.
As a workaround, I manually patched the 1.21.1 upstream image by adding a new entrypoint that removes the detection of the iptables mode and hard-codes legacy. I tested this by running multiple tests and Kind works as expected.
What you expected to happen:
Kind should create the cluster successfully.
How to reproduce it (as minimally and precisely as possible):
Assuming an EC2 instance with corresponding OS/tools installed (see below for versions):
- Boot EC2 instance
- Run
kind create cluster - If 2) ran to completion, destroy cluster, reboot VM and repeat at 1-2 until command hangs.
- Verify that process getting stuck is
iptables-nft-save.
In our CI infrastructure, it happens approx. 1 every 20 executions.
Anything else we need to know?:
This version of the Amazon Linux 2 AMI is a legacy iptables system as it doesn't have nftables. For an unknown reason though, the iptables wrapper ends up assuming is an nft system, thus executes iptables-nft-save incorrectly. Not sure why but I'd be happy to help debug further if you think that'd be useful.
Environment:
-
kind version: (use
kind version):kind v0.11.1 go1.16.4 linux/amd64 -
Kubernetes version: (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"} -
Docker version: (use
docker info):Client: Context: default Debug Mode: false Plugins: buildx: Build with BuildKit (Docker Inc., v0.5.1) Server: Containers: 4 Running: 4 Paused: 0 Stopped: 0 Images: 1 Server Version: 20.10.6 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 4.14.246-187.474.amzn2.x86_64 Operating System: Amazon Linux 2 OSType: linux Architecture: x86_64 CPUs: 48 Total Memory: 186.8GiB Name: ip-10-0-2-197.us-west-2.compute.internal ID: VW5K:TENS:DQNZ:5MW7:LJUW:C4EJ:NWSX:GRTY:UIDV:IELE:NRGK:B7W2 Docker Root Dir: /mnt/ephemeral/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine -
OS (e.g. from
/etc/os-release):NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"Amazon Linux 2 Release:
2.0.20211001.1
I think this was fixed by https://github.com/kubernetes-sigs/kind/commit/45c5aa40234752cdb65fd353e553ff13f0945c13 ,
thanks a lot for the prompt reply! i'm not too familiar with the ci workflows of this repo. I see that the 1.22.2 image has been getting updated with merges to main. would that image work if I use it with kind v0.11.1? or we'd need to wait for v0.11.2 (or v0.12.0)? thanks! 🙏
I see that the 1.22.2 image
use that image, it will work without upgrading the kind binary
thanks @aojea! we've tested with 1.22.2 and problem persists. as a workaround, we're using an image obtained from this commit on our fork https://github.com/vectorizedio/kind/commit/fc74b98229783c265ca4ac8043b395364d72f4a2
@ivotron if there is a bug with the detection I'd like to have it fixed, can you please paste the iptables version used on those systems?
In our CI infrastructure, it happens approx. 1 every 20 executions.
This is the interesting part I think. What different happens on these runs? Is docker failing to inject iptables rules into the nodes in time?
We also have this detection issue with podman, there hasn't been a great way to deal with this proposed yet.
This version of the Amazon Linux 2 AMI is a legacy iptables system as it doesn't have nftables.
As in, doesn't have nftables at all? Maybe we can detect this case @aojea ?
Any fix we find here is also a fix that Kubernetes itself would need, as the detection logic matches how kube-proxy's image handles this.
As in, doesn't have nftables at all? Maybe we can detect this case @aojea ?
I've downloaded an amazon 2 kvm image and is funny, it doesn't have the nft binaries indeed, but it doesn't have any iptables rule at all :)
Agree, detecting the binary seems more reliable
Any fix we find here is also a fix that Kubernetes itself would need, as the detection logic matches how kube-proxy's image handles this.
kubernetes has kubelet inserting iptables rules, that may break the tie. I think that if this logic was broken we should have lots of bug reports in k/k , but we don't, doesn't mean we should not improve it
Agree, detecting the binary seems more reliable
maybe probing for the kernel module inside the entrypoint?
the binary isn't actually necessarily relevant ... PATH for dockerd may not be PATH for kind
maybe probing for the kernel module inside the entrypoint?
the only difference are the binaries, the kernel structured are the same TL;DR https://developers.redhat.com/blog/2020/08/18/iptables-the-two-variants-and-their-relationship-with-nftables#using_iptables_nft
the only difference are the binaries, the kernel structured are the same TL;DR
that's not strictly true? e.g. you can not load the iptables modules if you are using nft. they speak to different kernel modules.
alternatively we might just need a wrapper around calling iptables-$version-save with a timeout to prevent hanging, since if the command exits we should see that there are no rules and default to legacy.
that's not strictly true? e.g. you can not load the iptables modules if you are using nft. they speak to different kernel modules
🤔 I'm not sure, I think that the only difference is the userspace , besides that, this is not the problem here ... I can ask some iptables hackers if needed
alternatively we might just need a wrapper around calling iptables-$version-save with a timeout to prevent hanging
The command reported to hang already has a timeout, it had a bug that was fixed with that behaviour, but since it has a timeout already I can not understand how it can hang, we need to debug this more before assuming anything
I forgot we already have the timeout ... 🙃
but it doesn't have any iptables rule at all :)
Docker should be creating some, I'm not sure why that's not the case here. For example the docker embedded DNS is implemented with iptables rules inside the container network namespace.
I don't use EC2 so I won't be debugging this one further for now, but that doesn't make sense.
Also again, we are shipping the same logic now that kube-proxy uses (and we were previously, but we needed to update it to keep in sync), so this system is seemingly broken for running Kubernetes directly on the host as well.
This issue has gotten old and we can't reproduce it.
Will revisit if we see further reports.