terraform-aws-eks
terraform-aws-eks copied to clipboard
reconciliation of cluster_version and ami_release_version during node-group updates
Description
This issue is mainly related to the submodule eks-managed-node-group.
We use ami_type = "BOTTLEROCKET_x86_64" coupled with cluster_version and ami_release_version variables.
The ami_release_version is configured for us in a TFE Variable Set, applied to our TFE workspaces. This way we can control the version at mass.
cluster_version is a data call to the EKS cluster so we retrieve its actual running version.
Let's consider the initial values:
ami_release_version = 1.20.5-a3e8bda1
cluster_version = 1.28
If the control plane is upgraded to 1.29 and I run a new plan and apply for the node-group configuration, the node-groups will be updated to cluster_version = 1.29 but the ami_release_version will be 1.21.1-82691b51 (which is latest, as of today).
I have to run a new plan and apply to bring the nodes back to the target ami_release_version:
ami_release_version = 1.20.5-a3e8bda1
cluster_version = 1.29
- [x] β I have searched the open/closed issues and my issue is not listed.
β οΈ Note
Before you submit an issue, please perform the following first:
- Remove the local
.terraformdirectory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/ - Re-initialize the project root to pull down modules:
terraform init - Re-attempt your terraform plan or apply and check if the issue still persists
Versions
- Module version [Required]: 20.24.0
- Terraform version: 1.7.5
- Provider version(s): 5.65.0
Reproduction Code [Required]
provider "aws" {
region = "us-east-1"
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.24.0"
cluster_name = "my-cluster"
cluster_version = var.cluster_version
cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
create_cloudwatch_log_group = false
create_cluster_security_group = true
create_iam_role = true
create_node_security_group = true
enable_irsa = true
node_security_group_enable_recommended_rules = true
eks_managed_node_group_defaults = {
vpc_security_group_ids = []
}
subnet_ids = var.subnet_ids
vpc_id = var.vpc_id
}
module "eks_managed_node_groups" {
source = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group"
version = "20.24.0"
cluster_name = module.eks.cluster_name
name = join("", [module.eks.cluster_name, "-S-NG-001"])
use_name_prefix = false
vpc_security_group_ids = [module.eks.node_security_group_id]
create_iam_role = true
iam_role_attach_cni_policy = true
subnet_ids = var.subnet_ids
min_size = 2
max_size = 2
desired_size = 2
create_launch_template = true
launch_template_name = join("", [module.eks.cluster_name, "-S-NG-001"])
launch_template_use_name_prefix = false
ami_type = "BOTTLEROCKET_x86_64"
ami_release_version = data.aws_ssm_parameter.image_version[0].value
cluster_version = var.cluster_version
cluster_auth_base64 = module.eks.cluster_certificate_authority_data
cluster_endpoint = module.eks.cluster_endpoint
cluster_service_cidr = module.eks.cluster_service_cidr
capacity_type = "SPOT"
instance_types = ["m5.xlarge"]
}
data "aws_ssm_parameter" "image_version" {
count = var.ami_release_version != null ? 1 : 0
name = "/aws/service/bottlerocket/aws-k8s-${module.eks.cluster_version}/x86_64/${var.ami_release_version}/image_version"
}
variable "ami_release_version" {
type = string
default = "1.20.5"
}
variable "subnet_ids" {
type = list(string)
}
variable "vpc_id" {
type = string
}
variable "cluster_version" {
type = string
default = "1.28"
}
Steps to reproduce the behavior:
- use the above HCL to build the resources; set
vpc_idandsubnet_idsaccording your environment - after resources are built, update
cluster_versionvariable to1.29and apply - control-plane will be upgraded from
1.28to1.29 - node-group will be updated to use a
1.29AMI but with arelease_versionof1.21.1-82691b51instead of1.20.5-a3e8bda1
Expected behavior
When both cluster_version and ami_release_version variables change, they should be reconciliated in one plan and apply.
Actual behavior
Two plans and apply are required to bring the nodes to a specific cluster_version and ami_release_version.
First plan will bring the cluster_version to the target version and the ami_release_version to the latest available version.
The second plan will downgrade the ami_release_version to the desired value.
Terminal Output Screenshot(s)
Update history tab:
Additional context
unfortunately, without a reproduction we will only be able to speculate
I've updated the issue to include the IaC for reproduction
Running:
aws eks update-nodegroup-version --cluster-name my-cluster --nodegroup-name my-cluster-S-NG-001 --kubernetes-version "1.30" --release-version "1.20.5-a3e8bda1"
will upgrade the cluster as per expectations, the release version won't be bumped to 1.21.1-82691b51.
why are you doing this:
ami_release_version = data.aws_ssm_parameter.image_version[0].value
...
}
data "aws_ssm_parameter" "image_version" {
count = var.ami_release_version != null ? 1 : 0
name = "/aws/service/bottlerocket/aws-k8s-${module.eks.cluster_version}/x86_64/${var.ami_release_version}/image_version"
}
instead of this:
ami_release_version = var.ami_release_version
...
}
Personal preference.
I like it simple: 1.20.5 instead of 1.20.5-a3e8bda1.
I'm open to flip it if that causes the issue.
I don't follow - you are inputting the value of 1.20.5-a3e8bda1 via the ami_release_version variable, only to look it up from the SSM parameter and get the exact same value back. If you already know the release version, just use it as a string and pass it to the input
I am inputting the value of 1.20.5 via the ami_release_version variable, and then the SSM parameter resolves it to the extended format, which I then use in the eks-managed-node-group module.
aws ssm get-parameter --name "/aws/service/bottlerocket/aws-k8s-1.30/x86_64/1.20.5/image_version" --region us-east-1 --query "Parameter.Value" --output text
There are two paths published in SSM to retrieve the image_version:
/aws/service/bottlerocket/aws-k8s-1.30/x86_64/1.20.5/image_version
/aws/service/bottlerocket/aws-k8s-1.30/x86_64/1.20.5-a3e8bda1/image_version
thats not what your reproduction details provided above show
I wasn't sufficiently clear. Sorry about that.
Those values you've just pointed out, are the ones supplied to the eks-managed-node-group child module. The resolved ones, if we were to say it like this.
The reproduction code, which I added as an edit to the opened issue shows that I'm passing the short version of the version:
variable "ami_release_version" {
type = string
default = "1.20.5"
}
Can this be acknowledged as a bug?
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.