eksctl
eksctl copied to clipboard
[Bug] Karpenter v0.32.4 does not work when deployed via eksctl
Summary: Karpenter deployment is successful but it fails to create new nodes
What were you trying to accomplish?
An EKS cluster was created using eksctl version 0.167.0 using the following manifest:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: do-eks-yaml-karpenter
version: "1.28"
region: us-west-2
tags:
karpenter.sh/discovery: do-eks-yaml-karpenter
iam:
withOIDC: true
addons:
- name: aws-ebs-csi-driver
version: v1.26.0-eksbuild.1
wellKnownPolicies:
ebsCSIController: true
karpenter:
version: 'v0.32.4'
createServiceAccount: true
# defaultInstanceProfile: 'KarpenterInstanceProfile'
withSpotInterruptionQueue: true
managedNodeGroups:
- name: c5-xl-do-eks-karpenter-ng
instanceType: c5.xlarge
instancePrefix: c5-xl
privateNetworking: true
minSize: 0
desiredCapacity: 2
maxSize: 10
volumeSize: 300
iam:
withAddonPolicies:
cloudWatch: true
ebs: true
eksctl create cluster -f ./eks-karpenter.yaml
The cluster creation finishes successfully. See logs below.
Apply NodePool and EC2NodeClass, then create a deployment that requires a GPU. The pod enters Pending state. It is expected that karpenter will add a GPU node to the cluster
What happened?
No nodes get added to the cluster Karpenter pods are in the Running state Karpenter pod logs show errors:
[karpenter-84bf6fff97-v5v2k] {"level":"ERROR","time":"2024-01-06T09:15:56.457Z","logger":"controller","message":"Reconciler error","commit":"fdf67d0","controller":"nodeclass","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"fe2de351-d378-4d82-aff7-556160f4d128","error":"creating instance profile, getting instance profile "do-eks-yaml-karpenter_4067990795380418201", AccessDenied: User: arn:aws:sts::<account_id>:assumed-role/eksctl-do-eks-yaml-karpenter-iamservice-role/1704531887056119458 is not authorized to perform: iam:GetInstanceProfile on resource: instance profile do-eks-yaml-karpenter_4067990795380418201 because no identity-based policy allows the iam:GetInstanceProfile action\n\tstatus code: 403, request id: f3a80d84-31cc-44ad-a6a4-91b4d3e56de3"}
How to reproduce it?
- Create cluster using the manifest shared above and command
eksctl create cluster -f ./eks-karpenter.yaml
- Create NodePool and EC2NodeClass by cloning project https://github.com/aws-samples/aws-do-eks, and executing script Container-Root/eks/deployment/karpenter/provisioner-deploy-v1beta1.sh
- Create deployment whith requests and limits of 1 nvdia.com/gpu by running script https://github.com/aws-samples/aws-do-eks/blob/main/Container-Root/eks/deployment/horizontal-pod-autoscaler/hpa-example/run.sh
- Tail karpenter pod logs:
kubectl -n karpenter logs -f $(kubectl -n karpenter get pod | grep karpenter | head -n 1 | cut -d ' ' -f 1)
Logs Cluster creation log:
eksctl create cluster -f /aws-do-eks/Container-Root/eks/conf/eksctl/yaml/eks-karpenter.yaml
2024-01-06 05:42:46 [ℹ] eksctl version 0.167.0
2024-01-06 05:42:46 [ℹ] using region us-west-2
2024-01-06 05:42:47 [ℹ] setting availability zones to [us-west-2b us-west-2a us-west-2c]
2024-01-06 05:42:47 [ℹ] subnets for us-west-2b - public:192.168.0.0/19 private:192.168.96.0/19
2024-01-06 05:42:47 [ℹ] subnets for us-west-2a - public:192.168.32.0/19 private:192.168.128.0/19
2024-01-06 05:42:47 [ℹ] subnets for us-west-2c - public:192.168.64.0/19 private:192.168.160.0/19
2024-01-06 05:42:47 [ℹ] nodegroup "c5-xl-do-eks-karpenter-ng" will use "" [AmazonLinux2/1.28]
2024-01-06 05:42:47 [ℹ] using Kubernetes version 1.28
2024-01-06 05:42:47 [ℹ] creating EKS cluster "do-eks-yaml-karpenter" in "us-west-2" region with managed nodes
2024-01-06 05:42:47 [ℹ] 1 nodegroup (c5-xl-do-eks-karpenter-ng) was included (based on the include/exclude rules)
2024-01-06 05:42:47 [ℹ] will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2024-01-06 05:42:47 [ℹ] will create a CloudFormation stack for cluster itself and 1 managed nodegroup stack(s)
2024-01-06 05:42:47 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=do-eks-yaml-karpenter'
2024-01-06 05:42:47 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "do-eks-yaml-karpenter" in "us-west-2"
2024-01-06 05:42:47 [ℹ] CloudWatch logging will not be enabled for cluster "do-eks-yaml-karpenter" in "us-west-2"
2024-01-06 05:42:47 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=do-eks-yaml-karpenter'
2024-01-06 05:42:47 [ℹ]
2 sequential tasks: { create cluster control plane "do-eks-yaml-karpenter",
2 sequential sub-tasks: {
5 sequential sub-tasks: {
wait for control plane to become ready,
associate IAM OIDC provider,
2 sequential sub-tasks: {
create IAM role for serviceaccount "kube-system/aws-node",
create serviceaccount "kube-system/aws-node",
},
restart daemonset "kube-system/aws-node",
1 task: { create addons },
},
create managed nodegroup "c5-xl-do-eks-karpenter-ng",
}
}
2024-01-06 05:42:47 [ℹ] building cluster stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:42:47 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:43:17 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:43:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:44:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:45:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:46:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:47:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:48:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:49:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:50:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:51:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:52:47 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-cluster"
2024-01-06 05:54:48 [ℹ] building iamserviceaccount stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:54:48 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:54:48 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:55:19 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-kube-system-aws-node"
2024-01-06 05:55:19 [ℹ] serviceaccount "kube-system/aws-node" already exists
2024-01-06 05:55:19 [ℹ] updated serviceaccount "kube-system/aws-node"
2024-01-06 05:55:19 [ℹ] daemonset "kube-system/aws-node" restarted
2024-01-06 05:55:19 [ℹ] building managed nodegroup stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:19 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:19 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:55:49 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:56:45 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:58:03 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-nodegroup-c5-xl-do-eks-karpenter-ng"
2024-01-06 05:58:03 [ℹ] waiting for the control plane to become ready
2024-01-06 05:58:04 [✔] saved kubeconfig as "/root/.kube/config"
2024-01-06 05:58:04 [ℹ] no tasks
2024-01-06 05:58:04 [✔] all EKS cluster resources for "do-eks-yaml-karpenter" have been created
2024-01-06 05:58:04 [ℹ] creating role using provided well known policies
2024-01-06 05:58:05 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:58:05 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:58:35 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:59:34 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-aws-ebs-csi-driver"
2024-01-06 05:59:34 [ℹ] creating addon
2024-01-06 06:00:23 [ℹ] addon "aws-ebs-csi-driver" active
2024-01-06 06:00:24 [ℹ] 1 task: { create karpenter for stack "do-eks-yaml-karpenter" }
2024-01-06 06:00:24 [ℹ] building nodegroup stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:24 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:24 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:00:54 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:01:44 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:02:16 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:02:52 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ] 1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-06 06:04:04 [ℹ] 1 task: { create IAM role for serviceaccount "karpenter/karpenter" }
2024-01-06 06:04:04 [ℹ] building iamserviceaccount stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ] deploying stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:04 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:34 [ℹ] waiting for CloudFormation stack "eksctl-do-eks-yaml-karpenter-addon-iamserviceaccount-karpenter-karpenter"
2024-01-06 06:04:34 [ℹ] adding identity "arn:aws:iam::159553542841:role/eksctl-KarpenterNodeRole-do-eks-yaml-karpenter" to auth ConfigMap
2024-01-06 06:04:34 [ℹ] adding Karpenter to cluster do-eks-yaml-karpenter
E0106 06:04:35.661520 1614 memcache.go:206] couldn't get resource list for karpenter.k8s.aws/v1alpha1: the server could not find the requested resource
E0106 06:04:35.732564 1614 memcache.go:206] couldn't get resource list for karpenter.k8s.aws/v1beta1: the server could not find the requested resource
E0106 06:04:35.821871 1614 memcache.go:206] couldn't get resource list for karpenter.sh/v1beta1: the server could not find the requested resource
2024-01-06 06:04:50 [ℹ] kubectl command should work with "/root/.kube/config", try 'kubectl get nodes'
2024-01-06 06:04:50 [✔] EKS cluster "do-eks-yaml-karpenter" in "us-west-2" region is ready
Sat Jan 6 06:04:50 UTC 2024
Done creating cluster using /aws-do-eks/Container-Root/eks/conf/eksctl/yaml/eks-karpenter.yaml
/aws-do-eks/Container-Root/eks
Karpenter pod log:
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:07.220Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:08.221Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:09.221Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:10.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:11.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:12.222Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:13.223Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"DEBUG","time":"2024-01-06T09:51:14.224Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"fdf67d0"}
[karpenter-84bf6fff97-v5v2k] {"level":"ERROR","time":"2024-01-06T09:51:14.749Z","logger":"controller","message":"Reconciler error","commit":"fdf67d0","controller":"nodeclass","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"default"},"namespace":"","name":"default","reconcileID":"4020d4ad-afda-4687-9f05-6ed98b8506f3","error":"creating instance profile, getting instance profile \"do-eks-yaml-karpenter_4067990795380418201\", AccessDenied: User: arn:aws:sts::159553542841:assumed-role/eksctl-do-eks-yaml-karpenter-iamservice-role/1704531887056119458 is not authorized to perform: iam:GetInstanceProfile on resource: instance profile do-eks-yaml-karpenter_4067990795380418201 because no identity-based policy allows the iam:GetInstanceProfile action\n\tstatus code: 403, request id: 63aaf857-2b29-45d2-a60e-e6e1fe9889a2"}
Anything else we need to know? To build the container image for the deployment, from the cloned project directory, execute the following commands: cd Container-Root/eks/deployment/horizontal-pod-autoscaler/hpa-example ./build.sh ./push.sh
Older versions of Karpenter (e.g. 0.29.0) used with Provisioner and AWSNodeTemplate work as expected. In this case the v1alpha5 API is used: https://github.com/aws-samples/aws-do-eks/blob/main/Container-Root/eks/deployment/karpenter/provisioner-deploy-v1alpha5.sh
Karpenter works as expected, when the cluster is created without Karpenter, then Karpenter v0.32.4 is deployed by following the instructions here: https://karpenter.sh/v0.32/getting-started/getting-started-with-karpenter/#4-install-karpenter
It appears like eksctl lacks support for the versions of Karpenter that support API v1beta1.
Versions
$ eksctl info
eksctl version: 0.167.0
kubectl version: v1.28.2
OS: linux
Hello iankouls-aws :wave: Thank you for opening an issue in eksctl
project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl
on our website
https://github.com/eksctl-io/eksctl/blob/9575570b554610129382d0a181645a3806cea98f/pkg/cfn/builder/karpenter.go#L151-L169
Looks like the controller policy is missing the AllowPassingInstanceRole
defined here:
https://github.com/aws/karpenter-provider-aws/blob/daeb5da355fce14f718f51c4956ca8f9319103dd/website/content/en/docs/getting-started/getting-started-with-karpenter/cloudformation.yaml#L186-L196
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Hi @yuxiang-zhang . Can you please assign this issue to me.
I fixed this issue in my cluster by manually adding these permissions to eksctl-KarpenterControllerPolicy-CLUSTERNAME policy:
- iam:GetInstanceProfile
- iam:CreateInstanceProfile
- iam:TagInstanceProfile
- iam:AddRoleToInstanceProfile
These are apparently missing when configuring Karpenter by eksctl.
Thanks @pstast for the policies. I am yet to raise a PR for the issue.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
I ran into this issue with 0.36.2 as well, and @pstast's recommended fix resolved it for me.
Still there with karpenter 0.37.0 and eksctl 0.183.0