terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

reconciliation of cluster_version and ami_release_version during node-group updates

Open AndreiBanaruTakeda opened this issue 5 months ago β€’ 11 comments

Description

This issue is mainly related to the submodule eks-managed-node-group.

We use ami_type = "BOTTLEROCKET_x86_64" coupled with cluster_version and ami_release_version variables.

The ami_release_version is configured for us in a TFE Variable Set, applied to our TFE workspaces. This way we can control the version at mass. cluster_version is a data call to the EKS cluster so we retrieve its actual running version.

Let's consider the initial values:

ami_release_version = 1.20.5-a3e8bda1
cluster_version = 1.28

If the control plane is upgraded to 1.29 and I run a new plan and apply for the node-group configuration, the node-groups will be updated to cluster_version = 1.29 but the ami_release_version will be 1.21.1-82691b51 (which is latest, as of today).

I have to run a new plan and apply to bring the nodes back to the target ami_release_version:

ami_release_version = 1.20.5-a3e8bda1
cluster_version = 1.29
  • [x] βœ‹ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

  • Module version [Required]: 20.24.0
  • Terraform version: 1.7.5
  • Provider version(s): 5.65.0

Reproduction Code [Required]

provider "aws" {
  region  = "us-east-1"
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.24.0"

  cluster_name    = "my-cluster"
  cluster_version = var.cluster_version


  cluster_endpoint_private_access              = true
  cluster_endpoint_public_access               = false
  create_cloudwatch_log_group                  = false
  create_cluster_security_group                = true
  create_iam_role                              = true
  create_node_security_group                   = true
  enable_irsa                                  = true
  node_security_group_enable_recommended_rules = true

  eks_managed_node_group_defaults = {
    vpc_security_group_ids = []
  }

  subnet_ids = var.subnet_ids
  vpc_id     = var.vpc_id
}

module "eks_managed_node_groups" {
  source  = "terraform-aws-modules/eks/aws//modules/eks-managed-node-group"
  version = "20.24.0"

  cluster_name    = module.eks.cluster_name
  name            = join("", [module.eks.cluster_name, "-S-NG-001"])
  use_name_prefix = false

  vpc_security_group_ids = [module.eks.node_security_group_id]

  create_iam_role            = true
  iam_role_attach_cni_policy = true

  subnet_ids = var.subnet_ids

  min_size     = 2
  max_size     = 2
  desired_size = 2

  create_launch_template          = true
  launch_template_name            = join("", [module.eks.cluster_name, "-S-NG-001"])
  launch_template_use_name_prefix = false

  ami_type             = "BOTTLEROCKET_x86_64"
  ami_release_version  = data.aws_ssm_parameter.image_version[0].value
  cluster_version      = var.cluster_version
  cluster_auth_base64  = module.eks.cluster_certificate_authority_data
  cluster_endpoint     = module.eks.cluster_endpoint
  cluster_service_cidr = module.eks.cluster_service_cidr

  capacity_type  = "SPOT"
  instance_types = ["m5.xlarge"]
}

data "aws_ssm_parameter" "image_version" {
  count = var.ami_release_version != null ? 1 : 0
  name  = "/aws/service/bottlerocket/aws-k8s-${module.eks.cluster_version}/x86_64/${var.ami_release_version}/image_version"
}

variable "ami_release_version" {
  type    = string
  default = "1.20.5"
}

variable "subnet_ids" {
  type    = list(string)
}

variable "vpc_id" {
  type    = string
}

variable "cluster_version" {
  type    = string
  default = "1.28"
}

Steps to reproduce the behavior:

  1. use the above HCL to build the resources; set vpc_id and subnet_ids according your environment
  2. after resources are built, update cluster_version variable to 1.29 and apply
  3. control-plane will be upgraded from 1.28 to 1.29
  4. node-group will be updated to use a 1.29 AMI but with a release_version of 1.21.1-82691b51 instead of 1.20.5-a3e8bda1

Expected behavior

When both cluster_version and ami_release_version variables change, they should be reconciliated in one plan and apply.

Actual behavior

Two plans and apply are required to bring the nodes to a specific cluster_version and ami_release_version.

First plan will bring the cluster_version to the target version and the ami_release_version to the latest available version.

The second plan will downgrade the ami_release_version to the desired value.

Terminal Output Screenshot(s)

Update history tab: image

Additional context

AndreiBanaruTakeda avatar Sep 03 '24 13:09 AndreiBanaruTakeda