eksctl
eksctl copied to clipboard
[Bug] kube-proxy image version 1.27 causing the kube-proxy to fail
What were you trying to accomplish?
Trying to deploy a new cluster, version 1.27, using eksctl. i am running the command: eksctl create cluster...
What happened?
I get the following error and the nodes for the cluster never come up. Looking at the logs inside the node, I see this error:
ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.27.1-minimal-eksbuild.1": failed to resolve reference "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.27.1-minimal-eksbuild.1": pulling from host 602401143452.dkr.ecr.us-east-1.amazonaws.com failed with status code [manifests v1.27.1-minimal-eksbuild.1]
How to reproduce it?
I am using a yaml file to deploy this. Not sure how you would reproduce it. But if you look at the aws documentation here: https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html the image is meant to be eksbuild.2 and not 1. and if you look at the eksctl code here: https://github.com/eksctl-io/eksctl/blob/c27d2e80f50aceb78c35c60b713f8e9267611dde/pkg/addons/default/kube_proxy.go#L150C1-L151 it is only calling eksbuild.1 and not 2.
Logs
ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.27.1-minimal-eksbuild.1": failed to resolve reference "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.27.1-minimal-eksbuild.1": pulling from host 602401143452.dkr.ecr.us-east-1.amazonaws.com failed with status code [manifests v1.27.1-minimal-eksbuild.1]
Anything else we need to know?
Versions 1.27
$ eksctl info
Hello artemisia480 :wave: Thank you for opening an issue in eksctl
project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl
on our website
Thanks @artemisia480 same problem here.
i am running the command: eksctl create cluster...
@artemisia480 did you run any commands after eksctl create cluster
, or did you try to update the image?
and if you look at the eksctl code here: https://github.com/eksctl-io/eksctl/blob/c27d2e80f50aceb78c35c60b713f8e9267611dde/pkg/addons/default/kube_proxy.go#L150C1-L151 it is only calling eksbuild.1 and not 2.
That codepath is not used in eksctl create cluster
.
I'm unable to reproduce this. I got the same image tag (602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/kube-proxy:v1.27.1-minimal-eksbuild.1
) on a new cluster and it was pulled successfully.
Can you share your config file?
@cPu1 , the code doesn't use it? are you sure? but the aws documentation says to use eksbuild.2 and clearly this pulls 1. here is my yaml file:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ami-testing-cluster2
version: "1.27"
region: us-east-1
vpc:
clusterEndpoints:
publicAccess: true
privateAccess: false
managedNodeGroups:
- name: ami-testing2
ami: <custome ami>
amiFamily: AmazonLinux2
instanceType: m6i.large
volumeSize: 20
disableIMDSv1: false
ssh:
allow: true
publicKeyPath: ~/.ssh/id_rsa.pub
overrideBootstrapCommand: |
#!/bin/bash
eks_register.sh ami-testing-cluster2
iam:
withAddonPolicies:
externalDNS: true
ebs: true
autoScaler: true
cloudWatch: false
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
Could this be an issue in a specific region? @artemisia480 do you have any clusters in other regions to confirm this?
@a-hilaly not sure why it would be region specific? But I can test a different region just to see.
@artemisia480 not really sure, but if it's a pull issue, maybe the image is not available in every region. Or are we using ECR public here? i'll try to replicate the same bug locally and update here.
@artemisia480 i haven't been able to reproduce your issue through 4/5 creations in different regions... maybe this is an issue with the custom AMI?
@a-hilaly thanks for testing that! I am starting to think it is the customer AMI after all. i am not sure what though. I had the following flags in the AMI for 1.26, which I have removed now for 1.27:
KUBELET_EKS_ARGS=--node-ip=192.168.22.222
--pod-infra-container-image=602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause-amd64:3.1
--cloud-provider aws
--config /etc/kubernetes/kubelet.json
--kubeconfig /etc/kubernetes/kubeconfig
--container-runtime remote
--container-runtime-endpoint unix:///var/run/containerd/containerd.sock
I also added the flag: --seccomp-default=unconfined.
But having no luck.
Do you run any extra commands after creating the cluster? any daemonset updates?
@artemisia480 I got a similar error when I added a containerd
node group to an eks 1.23 cluster. The containerd
nodes could not pull ECR image and reported the pull failed error. But the dockerd
nodes in the same cluster could pull the exact same image. My test cluster was in a VPC that did not have an ECR endpoint, in case that is relevant.
There seems to be something extra that containerd
nodes need. @a-hilaly any idea what that might be?
Pulling image "XXXXXX.dkr.ecr.ap-southeast-2.amazonaws.com/mycontainer:1.0.1" Warning Failed 8s (x3 over 47s) kubelet Failed to pull image "XXXXXX.dkr.ecr.ap-southeast-2.amazonaws.com/mycontainer:1.0.1": rpc error: code = NotFound desc = failed to pull and unpack image "XXXXXXX.dkr.ecr.ap-southeast-2.amazonaws.com/mycontainer:1.0.1": failed to copy: httpReadSeeker: failed open: could not fetch content descriptor sha256:d713dedd5b37c3ffea46d23c7933cc173c7755c789eab3bc60ea374cb5af740f (application/vnd.docker.distribution.manifest.v1+json) from remote: not found