[Help] Unable to upgrade a managed nodegroup
Hello!
I can’t upgrade a managed nodegroup using eksctl
Following document was used for the procedure: https://docs.aws.amazon.com/eks/latest/userguide/update-managed-node-group.html#mng-update
Steps to reproduce:
Create a cluster using following manifest
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: yby-test
region: eu-central-1
version: "1.28"
managedNodeGroups:
- name: mng-medium
instanceType: t3a.medium
desiredCapacity: 2
minSize: 1
maxSize: 2
volumeSize: 10
iam:
withAddonPolicies:
ebs: true
It gets created successfully
Next I upgrade a control plane's kubernetes version using following command:
eksctl upgrade cluster --name yby-test --region eu-central-1 --approve
Everything works fine:
2024-09-27 15:05:31 [ℹ] will upgrade cluster "yby-test" control plane from current version "1.28" to "1.29"
2024-09-27 15:14:54 [✔] cluster "yby-test" control plane has been upgraded to version "1.29"
2024-09-27 15:14:54 [ℹ] you will need to follow the upgrade procedure for all of nodegroups and add-ons
2024-09-27 15:14:55 [ℹ] re-building cluster stack "eksctl-yby-test-cluster"
2024-09-27 15:14:55 [✔] all resources in cluster stack "eksctl-yby-test-cluster" are up-to-date
2024-09-27 15:14:55 [ℹ] checking security group configuration for all nodegroups
2024-09-27 15:14:55 [ℹ] all nodegroups have up-to-date cloudformation templates
And then I try to upgrade nodegroup to the target version using:
eksctl upgrade nodegroup --cluster yby-test --region eu-central-1 --name mng-medium --kubernetes-version=1.29
Here is the log:
2024-09-27 15:17:45 [ℹ] will upgrade nodes to release version: 1.29.8-20240917
2024-09-27 15:17:45 [ℹ] upgrading nodegroup version
2024-09-27 15:17:45 [ℹ] updating nodegroup stack
2024-09-27 15:17:46 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1727443065" for stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:16 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1727443065" for stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:16 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:18:46 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:19:28 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:20:28 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:21:36 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:23:25 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:24:58 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:25:34 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:26:44 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:28:41 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:30:12 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:31:48 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
2024-09-27 15:33:39 [ℹ] waiting for CloudFormation stack "eksctl-yby-test-nodegroup-mng-medium"
Error: error updating nodegroup stack: waiter state transitioned to Failure
If I check Cloudformation console I see a following event:
ManagedNodeGroup
Resource handler returned message: "Requested release version 1.29.8-20240917 is not valid for kubernetes version 1.28. (Service: Eks, Status Code: 400, Request ID: 00c5f96d-c686-42a6-98e8-06abde8621d6)" (RequestToken: 38c80d12-e9e8-12b7-ab49-6c7cf2a65b6c, HandlerErrorCode: InvalidRequest)
If I try to upgrade a node pool using AWS web console everything works fine, but without any changes in Cloudformation logs. Therefore I suppose it doesn't use Cloudformation.
eksctl version
0.190.0-dev+3fccc8ed8.2024-09-04T12:58:57Z
What help do you need?
Please point me if I misunderstood the documentation or if it's a bug. Maybe there are other actions which nave to be done.
Tell me if I should provide more information or tests.
Thanks in advance.
-- Eugene Bykov
Hello ybykov-a9s :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website
I am also facing this issue. And I checked the CloudFormation, it mentioned the Resource handler returned message: "Volume of size 10GB is smaller than snapshot 'snap-0145xxxxxx10a66e4', expect size>= 20GB
But I can do that (less than 20 GB) in my another account. They are both in the same region. The only difference I can think of is k8s cluster version. I can create node with 10 GB in 1.29, can't in 1.30
The eksctl version I am using is 0.184
Any update on this ? Im facing same issue, Resource handler returned message: "Requested release version 1.31.0-20241024 is not valid for kubernetes version 1.30. (Service: Eks, Status Code: 400, Request ID: 15e2fb73-4134-4763-94d4-6b1ffc6d04b3)" (RequestToken: 1565436d-5bbc-7be1-7081-7a0631cf5842, HandlerErrorCode: InvalidRequest)
After I upgraded successfully control plane to 1.31, i cannot upgrade managed node group to 1.31.
Requested release version 1.31.0-20241024 is not valid for kubernetes version 1.30
Are you sure your control plane is already updated?
And I found the reason why I can not upgrade my managed nodegroup.
I am also facing this issue. And I checked the CloudFormation, it mentioned the
Resource handler returned message: "Volume of size 10GB is smaller than snapshot 'snap-0145xxxxxx10a66e4', expect size>= 20GBBut I can do that (less than 20 GB) in my another account. They are both in the same region. The only difference I can think of is k8s cluster version. I can create node with 10 GB in
1.29, can't in1.30The eksctl version I am using is
0.184
I am using two different AMI and OS. So AmazonLinux2 is able to reduce the disk size to 8 but AmazonLinux2023 can not. That is what I currently know... I am not sure if it is documented.
I also have the exact same problem when trying to upgrade the managed node group.
Resource handler returned message: "Requested release version 1.31.3-20250103 is not valid for kubernetes version 1.30. (Service: Eks, Status Code: 400, Request ID: 4aa1ba6a-840a-40ec-9934-0809b7c92538)" (RequestToken: a328d53f-60e2-b3a1-ba8f-5e6daa9d59ef, HandlerErrorCode: InvalidRequest
The control plane was already upgraded from version 1.30 to 1.31 successfully.
$ eksctl upgrade cluster --approve --name eks-analytics
2025-01-08 15:25:55 [ℹ] will upgrade cluster "eks-analytics" control plane from current version "1.30" to "1.31"
2025-01-08 15:35:50 [✔] cluster "eks-analytics" control plane has been upgraded to version "1.31"
2025-01-08 15:35:50 [ℹ] you will need to follow the upgrade procedure for all of nodegroups and add-ons
2025-01-08 15:35:51 [ℹ] re-building cluster stack "eksctl-eks-analytics-cluster"
2025-01-08 15:35:51 [✔] all resources in cluster stack "eksctl-eks-analytics-cluster" are up-to-date
2025-01-08 15:35:52 [ℹ] checking security group configuration for all nodegroups
2025-01-08 15:35:52 [ℹ] all nodegroups have up-to-date cloudformation templates
It shows the new version in the AWS console as well as via the command:
$ eksctl get cluster --name eks-analytics --output json | jq -r '.[].Version'
1.31
I also upgraded all the podidentityassociations successfully.
2025-01-08 15:35:55 [ℹ]
2 parallel tasks: { update pod identity association kube-system/aws-load-balancer-controller, update pod identity association cert-manager/cert-manager
}
2025-01-08 15:35:56 [ℹ] updating IAM resources stack "eksctl-eks-analytics-podidentityrole-cert-manager-cert-manager" for pod identity association "cert-manager/cert-manager"
2025-01-08 15:35:56 [ℹ] updating IAM resources stack "eksctl-eks-analytics-podidentityrole-kube-system-aws-load-balancer-controller" for pod identity association "kube-system/aws-load-balancer-controller"
2025-01-08 15:35:56 [ℹ] waiting for CloudFormation changeset "eksctl-kube-system-aws-load-balancer-controller-update-1736321756" for stack "eksctl-eks-analytics-podidentityrole-kube-system-aws-load-balancer-controller"
2025-01-08 15:35:56 [ℹ] nothing to update
2025-01-08 15:35:56 [ℹ] IAM resources for kube-system/aws-load-balancer-controller (pod identity association ID: kube-system/aws-load-balancer-controller) are already up-to-date
2025-01-08 15:35:56 [ℹ] waiting for CloudFormation changeset "eksctl-cert-manager-cert-manager-update-1736321756" for stack "eksctl-eks-analytics-podidentityrole-cert-manager-cert-manager"
2025-01-08 15:35:56 [ℹ] nothing to update
2025-01-08 15:35:56 [ℹ] IAM resources for cert-manager/cert-manager (pod identity association ID: cert-manager/cert-manager) are already up-to-date
2025-01-08 15:35:56 [ℹ] all tasks were completed successfully
And the addons
2025-01-08 15:35:59 [ℹ] Kubernetes version "1.31" in use by cluster "eks-analytics"
2025-01-08 15:35:59 [ℹ] updating addon
2025-01-08 15:38:02 [ℹ] addon "aws-ebs-csi-driver" active
2025-01-08 15:38:02 [ℹ] updating addon
2025-01-08 15:38:13 [ℹ] addon "coredns" active
2025-01-08 15:38:13 [ℹ] updating addon
2025-01-08 15:38:24 [ℹ] addon "eks-pod-identity-agent" active
2025-01-08 15:38:24 [ℹ] new version provided v1.31.3-eksbuild.2
2025-01-08 15:38:24 [ℹ] updating addon
2025-01-08 15:39:07 [ℹ] addon "kube-proxy" active
2025-01-08 15:39:08 [ℹ] updating addon
2025-01-08 15:39:18 [ℹ] addon "vpc-cni" active
At first I just tried to do the following to upgrade the node group and it finished without error but left the node group at version v1.30
$ eksctl upgrade nodegroup --cluster eks-analytics --name eks-analytics-ng-1 --wait
2025-01-08 15:40:00 [ℹ] setting ForceUpdateEnabled value to false
2025-01-08 15:40:00 [ℹ] updating nodegroup stack
2025-01-08 15:40:01 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322000" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:40:31 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322000" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:40:31 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:01 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:02 [ℹ] nodegroup "eks-analytics-ng-1" is already up-to-date
2025-01-08 15:41:02 [ℹ] will upgrade nodes to Kubernetes version: 1.30
2025-01-08 15:41:02 [ℹ] upgrading nodegroup version
2025-01-08 15:41:02 [ℹ] updating nodegroup stack
2025-01-08 15:41:02 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322062" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:32 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736322062" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:41:33 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:42:03 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:42:03 [ℹ] nodegroup successfully upgraded
But noticed it left things at version 1.30. So then I tried the following which resulted in the error above within the CloudFormation
$ eksctl upgrade nodegroup --cluster eks-analytics --kubernetes-version 1.31 --name eks-analytics-ng-1 --wait
2025-01-08 15:57:09 [ℹ] will upgrade nodes to release version: 1.31.3-20250103
2025-01-08 15:57:09 [ℹ] upgrading nodegroup version
2025-01-08 15:57:09 [ℹ] updating nodegroup stack
2025-01-08 15:57:09 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736323029" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:57:39 [ℹ] waiting for CloudFormation changeset "eksctl-update-nodegroup-1736323029" for stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:57:40 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:58:10 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:59:03 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 15:59:52 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:01:02 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:01:52 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
2025-01-08 16:03:18 [ℹ] waiting for CloudFormation stack "eksctl-eks-analytics-nodegroup-eks-analytics-ng-1"
Error: error updating nodegroup stack: waiter state transitioned to Failure
Got the same issue with upgrading node groups from 1.29 to 1.30
Control plane was updated successfully
$ eksctl version
0.201.0
We had the same problem on some clusters and we noticed that the cloudformation generated for upgrading NodeGroup put wrong "Version" number in "AWS::EKS::Nodegroup". Here we've upgraded from 1.30 to 1.31, but version is still 1.30 in CFN. We had the message "Requested release version 1.31.4-20250123 is not valid for kubernetes version 1.30. " .
"ManagedNodeGroup": {
"Type": "AWS::EKS::Nodegroup",
"Properties": {
"AmiType": "AL2_x86_64",
"ClusterName": "testcluster",
"ForceUpdateEnabled": true,
"InstanceTypes": [
"c5a.large",
"c5.large",
"c6i.large"
],
"Labels": {
"alpha.eksctl.io/cluster-name": "testcluster",
"alpha.eksctl.io/nodegroup-name": "ng-eks-1",
},
"LaunchTemplate": {
"Id": {
"Ref": "LaunchTemplate"
}
},
"NodeRole": {
"Fn::GetAtt": [
"NodeInstanceRole",
"Arn"
]
},
"NodegroupName": "ng-eks-1",
"ReleaseVersion": "1.31.4-20250123",
"ScalingConfig": {
"DesiredSize": 2,
"MaxSize": 4,
"MinSize": 2
},
"Subnets": [
"subnet-0800de77f4fd29000",
"subnet-0500ceeab196d00c"
],
"Tags": {
"alpha.eksctl.io/nodegroup-name": "ng-eks-1",
"alpha.eksctl.io/nodegroup-type": "managed",
"k8s.io/cluster-autoscaler/testcluster": "owned",
"k8s.io/cluster-autoscaler/enabled": "true",
},
"Taints": [
{
"Effect": "NO_EXECUTE",
"Key": "node.cilium.io/agent-not-ready",
"Value": "true"
}
],
"UpdateConfig": {
"MaxUnavailable": 2
},
"Version": "1.30"
}
},
As a workaround, we manually change the cloudformation to replace version 1.30 with 1.31 and update the CFN stack to make it work. I haven't yet managed to find out where this version was recovered. I hope this will help unblock those who are in this situation.
It works properly if you create and upgrade your cluster from config file:
eksctl create cluster -f <cluster_config>.yaml
eksctl upgrade cluster -f <cluster_config>.yaml
And doesn't work If you create cluster from config file but upgrade using --name arg :)
+1 on running into this problem now.
eksctl version
0.204.0
eksctl upgrade cluster -f eks.yaml --approve
2025-02-14 12:37:25 [!] NOTE: cluster VPC (subnets, routing & NAT Gateway) configuration changes are not yet implemented
2025-02-14 12:37:26 [ℹ] will upgrade cluster "eks" control plane from current version "1.31" to "1.32"
2025-02-14 12:45:22 [✔] cluster "eks" control plane has been upgraded to version "1.32"
2025-02-14 12:45:22 [ℹ] you will need to follow the upgrade procedure for all of nodegroups and add-ons
2025-02-14 12:45:22 [ℹ] re-building cluster stack "eksctl-eks-cluster"
2025-02-14 12:45:22 [✔] all resources in cluster stack "eksctl-eks-cluster" are up-to-date
2025-02-14 12:45:22 [ℹ] checking security group configuration for all nodegroups
2025-02-14 12:45:22 [ℹ] all nodegroups have up-to-date cloudformation templates
eksctl upgrade nodegroup --name=ng-1-workers --cluster=eks--kubernetes-version=1.32
Requested release version 1.32.0-20250203 is not valid for kubernetes version 1.31.
I can also confirm that manually changing the version in the cloudformation template got me past this issue.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Had same issue upgrading a handful of environments that were multiple versions behind the latest version in EKS. I found that updating the "Version" field in the cloudformation as described above allowed me to upgrade all of my node groups to the next version to align with the control plane (1.30 -> 1.31). However, eksctl still failed on the next pass to upgrade to the next release.
I had one environment that upgraded across all versions without issue and compared the cloudformation files for that environment to the files from the failing environments. What I found was that the "good" environment had no "Version" field at all in the cloudformation file. So, for my last set of environment upgrades, I first upgraded the ManagedNodeGroup resource by updating the ReleaseVersion to a version that aligned with the control plane, and removed the Version field entirely. After those updates applied successfully, I was able to use eksctl to perform subsequent upgrades.
TLDR; Remove the "Version" field from the ManagedNodeGroup resource in the cloudformation files for the first round of upgrades. eksctl will likely work on later upgrades.