terraform-aws-eks
terraform-aws-eks copied to clipboard
Infinite Plan Update on eks_managed_node_group for launch_template version -> $Default
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.
If your request is for a new feature, please use the Feature request template.
- [x] β I have searched the open/closed issues and my issue is not listed.
β οΈ Note
Before you submit an issue, please perform the following first:
- Remove the local
.terraformdirectory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/ - Re-initialize the project root to pull down modules:
terraform init - Re-attempt your terraform plan or apply and check if the issue still persists
Versions
-
Module version [Required]: v20.24.0
-
Terraform version: Terraform v1.9.5 on darwin_arm64
-
Provider version(s): Terraform v1.9.5 on darwin_arm64
- provider registry.terraform.io/alekc/kubectl v2.0.4
- provider registry.terraform.io/gavinbunney/kubectl v1.14.0
- provider registry.terraform.io/hashicorp/aws v5.64.0
- provider registry.terraform.io/hashicorp/cloudinit v2.3.4
- provider registry.terraform.io/hashicorp/helm v2.14.1
- provider registry.terraform.io/hashicorp/kubernetes v2.31.0
- provider registry.terraform.io/hashicorp/null v3.2.2
- provider registry.terraform.io/hashicorp/time v0.12.0
- provider registry.terraform.io/hashicorp/tls v4.0.5
- provider registry.terraform.io/terraform-aws-modules/http v2.4.1
Reproduction Code [Required]
node-groups.tf:
module "general_worker_nodes" {
source = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group"
version = "v20.24.0"
cluster_name = var.eks_cluster_name
cluster_primary_security_group_id = module.eks.cluster_primary_security_group_id
cluster_version = var.eks_cluster_version
cluster_service_cidr = module.eks.cluster_service_cidr
create_iam_role = false
create_launch_template = false
iam_role_arn = aws_iam_role.general_worker_nodes.arn
launch_template_id = aws_launch_template.general_worker_nodes.id
name = local.short_node_group_name_prefix
subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnets
use_custom_launch_template = true
update_launch_template_default_version = false
vpc_security_group_ids = [data.terraform_remote_state.vpc.outputs.internal_subnet_id]
max_size = var.eks_nodegroups["general"].max_size
min_size = var.eks_nodegroups["general"].min_size
desired_size = var.eks_nodegroups["general"].desired_size
instance_types = var.eks_nodegroups["general"].instance_types
ami_type = var.eks_nodegroups["general"].ami_type
capacity_type = var.eks_nodegroups["general"].capacity_type
labels = {
"nodegroup" = "general",
"environment" = data.terraform_remote_state.vpc.outputs.vpc_name_short
}
pre_bootstrap_user_data = <<-EOT
#!/bin/bash
mkdir -m 0600 -p ~/.ssh
touch ~ec2-user/.ssh/authorized_keys
cat >> ~ec2-user/.ssh/authorized_keys <<EOF
${data.terraform_remote_state.vpc.outputs.vpc_ssh_key}
EOF
EOT
tags = {
"Name" = "${var.eks_cluster_name}-Gen-EKS-Worker-Nodes"
"efs.csi.aws.com/cluster" = "true"
"kubernetes.io/cluster/${var.eks_cluster_name}" = "owned"
"aws-node-termination-handler/managed" = "true"
}
}
launch-templates.tf:
resource "aws_launch_template" "general_worker_nodes" {
update_default_version = true
key_name = var.eks_nodegroups["general"].ssh_key_name
vpc_security_group_ids = [data.terraform_remote_state.vpc.outputs.internal_subnet_id]
ebs_optimized = true
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = var.eks_nodegroups["general"].disk_size
encrypted = true
}
}
metadata_options {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
}
monitoring {
enabled = true
}
tag_specifications {
resource_type = "instance"
tags = {
"Name" = "${var.eks_cluster_name}-General-EKS-Worker-Nodes"
"efs.csi.aws.com/cluster" = "true"
"kubernetes.io/cluster/${var.eks_cluster_name}" = "owned"
"aws-node-termination-handler/managed" = "true"
}
}
tag_specifications {
resource_type = "volume"
tags = {
"Name" = "${var.eks_cluster_name}-General-EKS-Worker-Nodes"
"kubernetes.io/cluster/${var.eks_cluster_name}" = "owned"
}
}
tag_specifications {
resource_type = "network-interface"
tags = {
"Name" = "${var.eks_cluster_name}-General-EKS-Worker-Nodes"
"kubernetes.io/cluster/${var.eks_cluster_name}" = "owned"
}
}
}
auto.tfvars:
aws_region = "us-east-2"
eks_cluster_name = "development"
eks_cluster_version = "1.30"
eks_nodegroups = {
general = {
instance_types = [
"c6a.xlarge",
"m6a.xlarge",
"m6a.2xlarge",
"c5.xlarge",
"m5.xlarge",
"c4.xlarge",
"m4.xlarge"
]
ami_type = "AL2_x86_64"
capacity_type = "SPOT"
desired_size = 8
disk_size = 128
enabled = true
max_size = 36
max_unavailable_percentage = 25
min_size = 4
nodeselector = "general"
ssh_key_name = "MyCompany Staging"
}
}
Steps to reproduce the behavior:
terraform workspace select development-us-east-2-<redacted>terraform init -upgradeterraform apply
Workspaces: Yes.
Cleared cache: Yes.
Steps to issue:
terraform workspace select development-us-east-2-<redacted>terraform init -upgradeterraform apply
Expected behavior
Once applied, the plan should never attempt to update launch_template version from its current version to $Default.
Actual behavior
The plan continues to want to update the launch template's version for every run:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# module.findigs-eks.module.general_worker_nodes.aws_eks_node_group.this[0] will be updated in-place
~ resource "aws_eks_node_group" "this" {
id = "development:development-Gen-EKS-Worker-Nodes-20240807005426470100000001"
tags = {
"Name" = "development-Gen-EKS-Worker-Nodes"
"aws-node-termination-handler/managed" = "true"
"efs.csi.aws.com/cluster" = "true"
"kubernetes.io/cluster/development" = "owned"
}
# (16 unchanged attributes hidden)
~ launch_template {
id = "lt-0f09c225dd95a124d"
name = "terraform-20220519232718013800000003"
~ version = "11" -> "$Default"
}
# (3 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
Terminal Output Screenshot(s)
that is a lot of interesting configurations - may I ask why you are approaching your configuration from this perspective? meaning:
- Why use the node group sub-module independent of the overall EKS module?
- Why use a custom launch template outside of the module when the module already supports a custom launch template that is "safer" for EKS?
Hi Bryant, thank you for the response.
I can't definitively speak to why I ended up on this combo, however, loosely I think it had to do with not getting the correct disk_size and wanting the tagging to take place on all attached entities, like ENIs, EBS, etc.
IIRC, I feel like the only way I was able to get that combo was with this above, but perhaps some other iteration would work, however, I think long-term I'm moving towards Karpenter.
After many hours of debugging, I found that if I explicitly set launch_template_version to the current integer value, the infinite plan goes away, however, I do feel like there's an opportunity here to add logic such that we do not go to $Default unnecessarily.
My initial hypothesis was that the latest (or specified) launch template revision wasn't tagged as Default, however it was.
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
Just what the doctor ordered...not!
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.