kind icon indicating copy to clipboard operation
kind copied to clipboard

Incorrect iptables system detected (`nft` instead of `legacy`) on Amazon Linux 2

Open ivotron opened this issue 4 years ago • 13 comments

What happened:

Kind hangs when creating a cluster. More specifically, the kind-control-plane container gets stuck at the select_iptables() function, when executing the iptables-nft-save command of the entrypoint. The command hangs the startup process and Kind is unable to docker rm after marking the Starting control-plane step as failed.

As a workaround, I manually patched the 1.21.1 upstream image by adding a new entrypoint that removes the detection of the iptables mode and hard-codes legacy. I tested this by running multiple tests and Kind works as expected.

What you expected to happen:

Kind should create the cluster successfully.

How to reproduce it (as minimally and precisely as possible):

Assuming an EC2 instance with corresponding OS/tools installed (see below for versions):

  1. Boot EC2 instance
  2. Run kind create cluster
  3. If 2) ran to completion, destroy cluster, reboot VM and repeat at 1-2 until command hangs.
  4. Verify that process getting stuck is iptables-nft-save.

In our CI infrastructure, it happens approx. 1 every 20 executions.

Anything else we need to know?:

This version of the Amazon Linux 2 AMI is a legacy iptables system as it doesn't have nftables. For an unknown reason though, the iptables wrapper ends up assuming is an nft system, thus executes iptables-nft-save incorrectly. Not sure why but I'd be happy to help debug further if you think that'd be useful.

Environment:

  • kind version: (use kind version):

    kind v0.11.1 go1.16.4 linux/amd64
    
  • Kubernetes version: (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
    
  • Docker version: (use docker info):

    Client:
     Context:    default
     Debug Mode: false
     Plugins:
      buildx: Build with BuildKit (Docker Inc., v0.5.1)
    
    Server:
     Containers: 4
      Running: 4
      Paused: 0
      Stopped: 0
     Images: 1
     Server Version: 20.10.6
     Storage Driver: overlay2
      Backing Filesystem: extfs
      Supports d_type: true
      Native Overlay Diff: true
      userxattr: false
     Logging Driver: json-file
     Cgroup Driver: cgroupfs
     Cgroup Version: 1
     Plugins:
      Volume: local
      Network: bridge host ipvlan macvlan null overlay
      Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
     Swarm: inactive
     Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
     Default Runtime: runc
     Init Binary: docker-init
     containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
     runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
     init version: de40ad0
     Security Options:
      seccomp
       Profile: default
     Kernel Version: 4.14.246-187.474.amzn2.x86_64
     Operating System: Amazon Linux 2
     OSType: linux
     Architecture: x86_64
     CPUs: 48
     Total Memory: 186.8GiB
     Name: ip-10-0-2-197.us-west-2.compute.internal
     ID: VW5K:TENS:DQNZ:5MW7:LJUW:C4EJ:NWSX:GRTY:UIDV:IELE:NRGK:B7W2
     Docker Root Dir: /mnt/ephemeral/docker
     Debug Mode: false
     Registry: https://index.docker.io/v1/
     Labels:
     Experimental: false
     Insecure Registries:
      127.0.0.0/8
     Live Restore Enabled: false
     Product License: Community Engine
    
  • OS (e.g. from /etc/os-release):

    NAME="Amazon Linux"
    VERSION="2"
    ID="amzn"
    ID_LIKE="centos rhel fedora"
    VERSION_ID="2"
    PRETTY_NAME="Amazon Linux 2"
    ANSI_COLOR="0;33"
    CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
    HOME_URL="https://amazonlinux.com/"
    

    Amazon Linux 2 Release: 2.0.20211001.1

ivotron avatar Oct 25 '21 20:10 ivotron

I think this was fixed by https://github.com/kubernetes-sigs/kind/commit/45c5aa40234752cdb65fd353e553ff13f0945c13 ,

aojea avatar Oct 25 '21 21:10 aojea

thanks a lot for the prompt reply! i'm not too familiar with the ci workflows of this repo. I see that the 1.22.2 image has been getting updated with merges to main. would that image work if I use it with kind v0.11.1? or we'd need to wait for v0.11.2 (or v0.12.0)? thanks! 🙏

ivotron avatar Oct 26 '21 04:10 ivotron

I see that the 1.22.2 image

use that image, it will work without upgrading the kind binary

aojea avatar Oct 26 '21 07:10 aojea

thanks @aojea! we've tested with 1.22.2 and problem persists. as a workaround, we're using an image obtained from this commit on our fork https://github.com/vectorizedio/kind/commit/fc74b98229783c265ca4ac8043b395364d72f4a2

ivotron avatar Oct 28 '21 14:10 ivotron

@ivotron if there is a bug with the detection I'd like to have it fixed, can you please paste the iptables version used on those systems?

aojea avatar Oct 30 '21 16:10 aojea

In our CI infrastructure, it happens approx. 1 every 20 executions.

This is the interesting part I think. What different happens on these runs? Is docker failing to inject iptables rules into the nodes in time?

We also have this detection issue with podman, there hasn't been a great way to deal with this proposed yet.

This version of the Amazon Linux 2 AMI is a legacy iptables system as it doesn't have nftables.

As in, doesn't have nftables at all? Maybe we can detect this case @aojea ?

Any fix we find here is also a fix that Kubernetes itself would need, as the detection logic matches how kube-proxy's image handles this.

BenTheElder avatar Nov 02 '21 20:11 BenTheElder

As in, doesn't have nftables at all? Maybe we can detect this case @aojea ?

I've downloaded an amazon 2 kvm image and is funny, it doesn't have the nft binaries indeed, but it doesn't have any iptables rule at all :)

Agree, detecting the binary seems more reliable

Any fix we find here is also a fix that Kubernetes itself would need, as the detection logic matches how kube-proxy's image handles this.

kubernetes has kubelet inserting iptables rules, that may break the tie. I think that if this logic was broken we should have lots of bug reports in k/k , but we don't, doesn't mean we should not improve it

aojea avatar Nov 02 '21 22:11 aojea

Agree, detecting the binary seems more reliable

maybe probing for the kernel module inside the entrypoint?

the binary isn't actually necessarily relevant ... PATH for dockerd may not be PATH for kind

BenTheElder avatar Nov 02 '21 22:11 BenTheElder

maybe probing for the kernel module inside the entrypoint?

the only difference are the binaries, the kernel structured are the same TL;DR https://developers.redhat.com/blog/2020/08/18/iptables-the-two-variants-and-their-relationship-with-nftables#using_iptables_nft

aojea avatar Nov 02 '21 23:11 aojea

the only difference are the binaries, the kernel structured are the same TL;DR

that's not strictly true? e.g. you can not load the iptables modules if you are using nft. they speak to different kernel modules.

alternatively we might just need a wrapper around calling iptables-$version-save with a timeout to prevent hanging, since if the command exits we should see that there are no rules and default to legacy.

BenTheElder avatar Nov 02 '21 23:11 BenTheElder

that's not strictly true? e.g. you can not load the iptables modules if you are using nft. they speak to different kernel modules

🤔 I'm not sure, I think that the only difference is the userspace , besides that, this is not the problem here ... I can ask some iptables hackers if needed

alternatively we might just need a wrapper around calling iptables-$version-save with a timeout to prevent hanging

The command reported to hang already has a timeout, it had a bug that was fixed with that behaviour, but since it has a timeout already I can not understand how it can hang, we need to debug this more before assuming anything

aojea avatar Nov 03 '21 07:11 aojea

I forgot we already have the timeout ... 🙃

BenTheElder avatar Nov 03 '21 07:11 BenTheElder

but it doesn't have any iptables rule at all :)

Docker should be creating some, I'm not sure why that's not the case here. For example the docker embedded DNS is implemented with iptables rules inside the container network namespace.

I don't use EC2 so I won't be debugging this one further for now, but that doesn't make sense.

Also again, we are shipping the same logic now that kube-proxy uses (and we were previously, but we needed to update it to keep in sync), so this system is seemingly broken for running Kubernetes directly on the host as well.

BenTheElder avatar Nov 18 '21 19:11 BenTheElder

This issue has gotten old and we can't reproduce it.

Will revisit if we see further reports.

BenTheElder avatar Apr 18 '23 04:04 BenTheElder