amazon-vpc-cni-k8s icon indicating copy to clipboard operation
amazon-vpc-cni-k8s copied to clipboard

IPAMD fails to start

Open grumpymatt opened this issue 2 years ago • 24 comments

What happened: IPAMD fails to start with iptables error. The aws-node pods fail to start and prevent worker nodes from going ready. This is occurring after updating to rocky linux 8.5 which is based on rhel 8.5.

/var/log/aws-routed-eni/ipamd.log

{"level":"error","ts":"2022-02-04T14:38:08.239Z","caller":"networkutils/network.go:385","msg":"ipt.NewChain error for chain [AWS-SNAT-CHAIN-0]: running [/usr/sbin/iptables -t nat -N AWS-SNAT-CHAIN-0 --wait]: exit status 3: iptables v1.8.4 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)\nPerhaps iptables or your kernel needs to be upgraded.\n"}

POD logs kubectl logs -n kube-system aws-node-9tqb6

{"level":"info","ts":"2022-02-04T15:11:48.035Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-02-04T15:11:48.036Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-02-04T15:11:48.062Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-02-04T15:11:48.071Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-02-04T15:11:50.092Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-02-04T15:11:52.103Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-02-04T15:11:54.115Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-02-04T15:11:56.124Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

Attach logs

What you expected to happen: Expect ipamd to start normally.

How to reproduce it (as minimally and precisely as possible): Deploy eks cluster with ami based on Rocky 8.5. In theory any rhel 8.5 could have this problem.

Anything else we need to know?: Running the iptables command from the ipamd log as root on the worker node works fine.

Environment:

  • Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.15-eks-9c63c4", GitCommit:"9c63c4037a56f9cad887ee76d55142abd4155179", GitTreeState:"clean", BuildDate:"2021-10-20T00:21:03Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  • CNI: 1.10.1
  • OS (e.g: cat /etc/os-release): NAME="Rocky Linux" VERSION="8.5 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.5" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.5 (Green Obsidian)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky Linux" ROCKY_SUPPORT_PRODUCT_VERSION="8"
  • Kernel (e.g. uname -a): Linux ip-10-2--xx-xxx.ec2.xxxxxxxx.com 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

grumpymatt avatar Feb 04 '22 15:02 grumpymatt

We found by loading ip_tables, iptable_nat, and iptable_mangle kernel modules fixes the issue: modprobe ip_tables iptable_nat iptable_mangle

Still trying to figure out why these modules where loaded be default in 8.4 and not in 8.5. Also still not sure why the same iptables commands work without these modules directly on the worker instance and not in the container.

grumpymatt avatar Feb 04 '22 18:02 grumpymatt

We do install iptables by default in aws-node container images. Good to check the changelog between 8.4 & 8.5 for any insights in to the observed behavior.

achevuru avatar Feb 09 '22 20:02 achevuru

@grumpymatt I have been getting the same issue while setting up EKS on rhel8.5. And after loading the kernel modules, it does work fine. The strange thing is I had tried the same in RHEL8.0 worker nodes and still getting the same issue. It works fine in RHEL7.x, though.

vishal0nhce avatar Feb 15 '22 07:02 vishal0nhce

@grumpymatt Since the issue is clearly tied to missing iptables modules, I think we can close this issue. Let us know, if there is any other concern.

@vishal0nhce Yeah, iptables module is required for VPC CNI and not sure why it is missing in rhel8.5. I don't see any specific call out for rhel 8.5 around this.

achevuru avatar Feb 15 '22 22:02 achevuru

We found an alternative way of fixing it by updating iptables inside the CNI container image.

from 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.10.1
run yum install iptables-nft -y
run cd /usr/sbin && rm iptables && ln -s xtables-nft-multi iptables

My concern is the direction of RHEL and downstream distros seems to be away from iptables-legacy and to iptables-nft. Is there any plans to release address this in the CNI container image?

grumpymatt avatar Feb 15 '22 22:02 grumpymatt

Interesting. So, RHEL 8 doesn't support iptables-legacy anymore? That explains the issue. I think iptables legacy mode is sort of the default (at least for now) for most distributions and in particular Amazon Linux 2 images use iptables-legacy by default as well. We will track AL2 images for our default CNI builds. Will check and update if there is something we can do to address this scenario.

achevuru avatar Feb 16 '22 01:02 achevuru

We are seeing a similar situation where IPAM-D won't start successfully and the aws-node pod would restart at least once. We are running eks 1.20.

$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

bilby91 avatar Apr 04 '22 18:04 bilby91

@bilby91 - Can you please check if kube-proxy is taking time to start? Kube-proxy should setup rules for aws-node to reach API Server on startup.

jayanthvn avatar Apr 18 '22 23:04 jayanthvn

Similar error of IPAMD failing to start with latest version v1.11.0. Kube-proxy is already running successfully. Only VPC CNI image update from 1.9.0 to 1.11.0. Any clue what's wrong with the latest version? TIA

{"level":"info","ts":"2022-04-21T19:44:43.569Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

dhavaln-able avatar Apr 21 '22 19:04 dhavaln-able

Similar error of IPAMD failing to start with latest version v1.11.0. Kube-proxy is already running successfully. {"level":"info","ts":"2022-04-27T10:07:56.670Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

kathy-lee avatar Apr 27 '22 10:04 kathy-lee

I was seeing this error. In my case, a developer had manually created VPC endpoints for a few services, including STS, resulting in traffic to the services being blackholed. So ipamd could not create a session to collect the information it needed to.

js-timbirkett avatar May 06 '22 06:05 js-timbirkett

I am also facing the same issue while trying to upgrade the cluster from 1.19 to 1.20 in EKS . Can't pinpoint the exact problem.

sahil100122 avatar May 06 '22 09:05 sahil100122

@dhavaln-able and @kathy-lee - So with v1.11.0 is aws-node continuously crashing or is it coming up after few restarts?

@sahil100122 - You mean while upgrading kube-proxy is up and running but ipamd is not starting at all?

jayanthvn avatar May 06 '22 18:05 jayanthvn

I found FlatCar CoreOS also encounter related issue, the iptables command of FlatCar CoreOS version 3033.2.0 uses the nftables kernel backend instead of the iptables backend, that leads to the pod which belong to secondary eni cannot access K8s internal ClusterIP

Thank for @grumpymatt's workaround, after I follow the same way to build customized amazon-k8s-cni container image, currently aws vpc cni works in the version of FlatCar CoreOS greater than 3033.2.0

smalltown avatar May 08 '22 08:05 smalltown

Had same issue while upgrading, but after looking at the trouble shooting guide and patching the daemonset with the following, aws-node came up as expected and without issues.

# New env vars introduced with 1.10.x
- op: add
  path: "/spec/template/spec/initContainers/0/env/-"
  value: {"name": "ENABLE_IPv6", "value": "false"}
- op: add
  path: "/spec/template/spec/containers/0/env/-"
  value: {"name": "ENABLE_IPv4", "value": "true"}
- op: add
  path: "/spec/template/spec/containers/0/env/-"
  value: {"name": "ENABLE_IPv6", "value": "false"}

rhenry-brex avatar Jun 01 '22 00:06 rhenry-brex

I also face above issue, but in my case I am using custom kube-proxy image. But when I reverted to default kube-proxy image and restart aws-node pods, all things works fine.

Why aws-node ipamd not giving any error related to communication if issue is with kube-proxy 🤔

varunpalekar avatar Jul 18 '22 22:07 varunpalekar

Had a similar issue yesterday. AWS Systems Manager applied a patch to all of our nodes. This patch required a reboot of the instances. All instances came up healthy, but on three out of five the network was not working basically making the cluster unusable. Investigation lead me to issues like the one here or this AWS Knowledge Center entry from AWS.

Recycling all nodes resolved the issue. Did not try to just terminate the aws-node pods. Interestingly only one out of three clusters was affected. So probably difficult to reproduce.

What I also noticed: Why is aws-node mounting /var/run/dockershim.sock even though we use containerd?

  • AWS Node Image: 602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni:v1.10.1-eksbuild.1
  • Default kube-proxy, default aws-node, etc.

trallnag avatar Jul 19 '22 11:07 trallnag

Hey all 👋🏼 please be aware that this failure mode happens also when the IPs for a subnet are exhausted.

I just faced this and noticed I had mis-configured my worker groups to use a small subnet (/26) instead of a bigger one I intended to use (/18).

inge4pres avatar Jul 20 '22 18:07 inge4pres

Also: Check you have the right security group attached to your nodes

TaiSHiNet avatar Aug 12 '22 19:08 TaiSHiNet

For those coming here after upgrading EKS try re-applying the VPC CNI manifest file, for example: kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.11.3/config/master/aws-k8s-cni.yaml

esidate avatar Sep 09 '22 18:09 esidate

For me, the issue was policy/AmazonEKS_CNI_Policy-2022092909143815010000000b My policy only allowed IPV6 like below.

{
    "Statement": [
        {
            "Action": [
                "ec2:DescribeTags",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:AssignIpv6Addresses"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "IPV6"
        },
        {
            "Action": "ec2:CreateTags",
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Sid": "CreateTags"
        }
    ],
    "Version": "2012-10-17"
}

I changed the policy like below:

{
    "Statement": [
        {
            "Action": [
                "ec2:UnassignPrivateIpAddresses",
                "ec2:ModifyNetworkInterfaceAttribute",
                "ec2:DetachNetworkInterface",
                "ec2:DescribeTags",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DeleteNetworkInterface",
                "ec2:CreateNetworkInterface",
                "ec2:AttachNetworkInterface",
                "ec2:AssignPrivateIpAddresses"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "IPV4"
        },
        {
            "Action": [
                "ec2:DescribeTags",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:AssignIpv6Addresses"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "IPV6"
        },
        {
            "Action": "ec2:CreateTags",
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Sid": "CreateTags"
        }
    ],
    "Version": "2012-10-17"
}

and it works! 😅

jacobhjkim avatar Sep 29 '22 12:09 jacobhjkim

I've had the same problem these two weeks, has someone found a solution?

zhengyongtao avatar Oct 02 '22 15:10 zhengyongtao

I've had the same problem these two weeks, has someone found a solution?

Can you please share the last few lines of ipamd logs before aws node restarts?

jayanthvn avatar Oct 02 '22 18:10 jayanthvn

I've had the same problem these two weeks, has someone found a solution?

Can you please share the last few lines of ipamd logs before aws node restarts?

ipamd log:

{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.43.0"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.43.0/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.60.1"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.60.1/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.47.2"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.47.2/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.46.131"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.46.131/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.61.196"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.61.196/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.49.6"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.49.6/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.41.135"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.41.135/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.38.218"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.38.218/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.39.157"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.39.157/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.59.213"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.59.213/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:653","msg":"Reconcile existing ENI eni-00023922abf62516c IP prefixes"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1351","msg":"Found prefix pool count 0 for eni eni-00023922abf62516c\n"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:653","msg":"Successfully Reconciled ENI/IP pool"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1396","msg":"IP pool stats: Total IPs/Prefixes = 87/0, AssignedIPs/CooldownIPs: 31/0, c.maxIPsPerENI = 29"}
command terminated with exit code 137

aws-node:

# kubectl logs -f aws-node-zdp6x --tail 30 -n kube-system  
{"level":"info","ts":"2022-10-02T14:56:07.820Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-10-02T14:56:07.821Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-10-02T14:56:07.833Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-10-02T14:56:07.834Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-10-02T14:56:09.841Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:11.847Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:13.853Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:15.860Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:17.866Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:19.872Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:21.878Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:23.884Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:25.890Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:27.897Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:29.903Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:31.909Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:33.916Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:35.922Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:37.928Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:39.934Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:41.940Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:43.947Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:45.953Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:47.959Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:49.966Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

Event screenshots:

image

I used cluster-autoscaler for auto-scaling, k8s version is 1.22, also following the troubleshooting guide https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#known-issues and applying the suggestion kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.11.4/config/master/aws-k8s-cni.yaml

Interestingly, this failure usually only occurs on a certain node, and when I terminate the instance of that node and make it automatically expand again, it starts working.

But after running for a while, it will restart again

zhengyongtao avatar Oct 03 '22 00:10 zhengyongtao

I am having the same issue applying the v.11.4 update. For those trying to do this, v1.11.3 and v1.11.4, make sure to substitute EKS_CLUSTER_NAME and VPC_ID with proper values, at least in my case, it didn't work otherwise.

For those coming here after upgrading EKS try re-applying the VPC CNI manifest file, for example: kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.11.3/config/master/aws-k8s-cni.yaml

vd-rd avatar Oct 18 '22 19:10 vd-rd

I've had the same problem recently, has someone found a solution?

ermiaqasemi avatar Oct 27 '22 22:10 ermiaqasemi

@ermiaqasemi From this tutorial I chose to attach the AmazonEKS_CNI_Policy to the aws-node service account and I was getting the error.

I decided to try simply attaching it to the AmazonEKSNodeRole, which apparently is the less recommended way to do it, but it works.

itay-grudev avatar Oct 30 '22 12:10 itay-grudev

@itay-grudev Thanks for share, but in my case I don't think it's realtd to the node IAM roles since the role has already attached to the nodes!

ermiaqasemi avatar Oct 31 '22 15:10 ermiaqasemi

@ermiaqasemi Did you also try a lower EKS version? Some people specifically reported problems with v1.11.4. I created a new cluster with 1.10.4 which is finally working.

itay-grudev avatar Oct 31 '22 16:10 itay-grudev

By EKS version you meant the CNI version, right? My EKS version is 1.22 and my CNI version is 1.10.2-eksbuild.1`. I didn't have this problem with EKS 1.21. @itay-grudev

ermiaqasemi avatar Oct 31 '22 17:10 ermiaqasemi