terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

17.24.0 => 18.20.5 upgrade causes destruction of cluster iam_role

Open jdomantay opened this issue 2 years ago β€’ 7 comments

Description

Hello I'm trying to upgrade my EKS Module from 17.24.0 to 18.20.2 but I'm encountering issues where it tries to destroy resources and if I follow-through with the apply it causes Cycle Error for terraform

  • [βœ…] βœ‹ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: Current: 17.24.0 Target: 18.20.5
  • Terraform version:

    = 0.12

  • Provider version(s): kubernetes-2.10.0 aws - 3.75.1

17.24.0 Code

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version         = "17.24.0"

  cluster_name    = "sf-${var.region}-${var.Environment}"
  cluster_version = "1.19"
  vpc_id          = "vpc-026e5737d5686b491"
  subnets         = data.aws_subnet_ids.private.ids
  map_roles       = local.map_roles
  map_users       = local.map_users 
}

18.20.5 Code

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.20.5"

  cluster_name    = "${var.region}-${var.Environment}"
  cluster_version = "1.19"

  vpc_id          = "vpc-xxxxxxxx"
  subnet_ids      = data.aws_subnet_ids.private.ids
  
  aws_auth_roles = local.map_roles
  aws_auth_users = local.map_users
  create_aws_auth_configmap = true
  manage_aws_auth_configmap = true

################################################################################
# Required Values to prevent cluster destruction
################################################################################

  ################################################################################
  # plan.txt settings
  ################################################################################

 /*    create_cloudwatch_log_group               = false
    cluster_enabled_log_types                 = []
    prefix_separator                          = ""
    iam_role_name                             = "${var.region}-${var.Environment}"
    cluster_security_group_name               = "${var.region}-${var.Environment}"
    cluster_security_group_description        = "EKS cluster security group."
 */
  ################################################################################
  # plan1.txt settings
  ################################################################################

/*     create_cloudwatch_log_group               = false
    cluster_enabled_log_types                 = []
    prefix_separator                          = ""
    iam_role_name                             = "${var.region}-${var.Environment}"
    cluster_security_group_name               = "${var.region}-${var.Environment}"
    cluster_security_group_description        = "EKS cluster security group."
    iam_role_arn                              = "arn:aws:iam::xxxxxxxx:role/us-west-2-xxxxxxxxx"

 */
  ################################################################################
  # plan2.txt settings
  ################################################################################
  
    create_cloudwatch_log_group               = false
    cluster_enabled_log_types                 = []
    prefix_separator                          = ""
    iam_role_name                             = "${var.region}-${var.Environment}"
    cluster_security_group_name               = "${var.region}-${var.Environment}"
    cluster_security_group_description        = "EKS cluster security group."
    iam_role_arn                              = "arn:aws:iam::xxxxxxxx:role/us-west-2-xxxxxxxxx"
    create_iam_role                           = false

################################################################################
# Required Values to prevent cluster destruction
################################################################################

}

Locals.tf


locals {
  map_roles = [
    {
      rolearn  = "arn:aws:iam::${var.account}:role/${var.region}-${var.Environment}-admin"
      username = "${var.region}-${var.Environment}-admin"
      groups   = ["system:masters"]
    },
    {
      rolearn  = "arn:aws:iam::${var.account}:role/${var.region}-${var.Environment}-edit"
      username = "${var.region}-${var.Environment}-edit"
      groups   = ["xxxx"]
    },
    {
      rolearn  = "arn:aws:iam::${var.account}:role/${var.region}-${var.Environment}-read"
      username = "${var.region}-${var.Environment}-read"
      groups   = ["xxxx"]
    }

  ]

    map_users = [
    {
      userarn  = "arn:aws:iam::${var.account}:user/xxxx"
      username = "xxxx"
      groups   = ["system:masters"]
    },
    {
      userarn  = "arn:aws:iam::xxxxxxxxx:user/xxxx"
      username = "xxxx"
      groups   = ["system:masters"]
    }
  ]
}

Steps to reproduce the behavior:

  1. Apply changes for the module upgrade.
  2. terraform init -upgrade; terraform plan

Expected behavior

  1. No destruction of the cluster and the IAM Role.

Actual behavior

  1. The first and second configuration I used (under plan.txt and plan1.txt comment). forces replacement of the cluster iam_role, thus it triggers a cluster recreation
  2. The third configuration (under plan2.txt comment). Successfully migrates to the new module but it destroys the cluster iam_role.

plan.txt plan1.txt plan2.txt

jdomantay avatar May 13 '22 07:05 jdomantay

There is an update coming to the migration docs - check out the WIP here https://github.com/clowdhaus/eks-v17-v18-migrate and let me know if this helps clarify how to handle this

bryantbiggs avatar May 13 '22 15:05 bryantbiggs

Has the above migration doc been moved to eks module repo? There are steps to change terraform states for node pool (using terrform state mv). I see others uses make terraform to forget old pool (terraform state rm). I wonder which is better to avoid downtime. Thanks @bryantbiggs

xueshanf avatar Jun 01 '22 18:06 xueshanf

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Jul 02 '22 00:07 github-actions[bot]

bump so this doesnt go stale

drmaples avatar Jul 04 '22 16:07 drmaples

I can confirm that I am seeing this same issue even without a migration from 17 -> 18

The issue appears to be that that the aws_auth_configmap_data isn't being fully encoded as yaml. In this example the output test appears to be valid where as test2 (which is how the configmap resource in this module is defined) is kind of funky

locals {
  aws_auth_configmap_data = {
    mapRoles    = yamlencode(local.aws_auth_roles)
    mapUsers    = yamlencode(local.aws_auth_users)
    mapAccounts = yamlencode(local.aws_auth_accounts)
  }
  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::000000000000:role/AdminRole"
      username = "admin"
      groups   = ["system:masters"]
    },
  ]
  aws_auth_users = [
    {
      userarn  = "arn:aws:iam::000000000000:user/steve"
      username = "steve"
      groups   = ["system:masters"]
    },
    {
      userarn  = "arn:aws:iam::000000000000:user/bob"
      username = "bob"
      groups   = ["system:masters"]
    },
  ]
  aws_auth_accounts = [
    "000000000000"
  ]
}

output "test" {
  value = yamlencode(local.aws_auth_configmap_data)
}

output "test2" {
  value = local.aws_auth_configmap_data
}
➜  t git:(develop) βœ—  terraform plan

Changes to Outputs:
  + test  = <<-EOT
        "mapAccounts": |
          - "000000000000"
        "mapRoles": |
          - "groups":
            - "system:masters"
            "rolearn": "arn:aws:iam::000000000000:role/AdminRole"
            "username": "admin"
        "mapUsers": |
          - "groups":
            - "system:masters"
            "userarn": "arn:aws:iam::000000000000:user/steve"
            "username": "steve"
          - "groups":
            - "system:masters"
            "userarn": "arn:aws:iam::000000000000:user/bob"
            "username": "bob"
    EOT
  + test2 = {
      + mapAccounts = <<-EOT
            - "000000000000"
        EOT
      + mapRoles    = <<-EOT
            - "groups":
              - "system:masters"
              "rolearn": "arn:aws:iam::000000000000:role/AdminRole"
              "username": "admin"
        EOT
      + mapUsers    = <<-EOT
            - "groups":
              - "system:masters"
              "userarn": "arn:aws:iam::000000000000:user/steve"
              "username": "steve"
            - "groups":
              - "system:masters"
              "userarn": "arn:aws:iam::000000000000:user/bob"
              "username": "bob"
        EOT
    }

sfozz avatar Jul 07 '22 23:07 sfozz

OK figured this out...

I'd defined a variable for the roles and users as:

variable "aws_map_users" {
  type = list(object({}))
}

which meant that the list of users gets changed to:

- {}
- {}

Changing the var to

variable "aws_map_users" {
  type = list(any)
}

fixes this

sfozz avatar Jul 08 '22 02:07 sfozz

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Aug 08 '22 00:08 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar Aug 18 '22 00:08 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 09 '22 02:11 github-actions[bot]