terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Cluster_addons do not come up

Open rbarrett-impinj opened this issue 2 years ago β€’ 1 comments

Description

EKS cluster_addons take forever to actually create, if I click in the console it takes about 5-10 minutes, but with this particular terrafrom module it seems like it never fully works.

  • [x] βœ‹ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/ - COMPLETED
  2. Re-initialize the project root to pull down modules: terraform init - COMPLETED
  3. Re-attempt your terraform plan or apply and check if the issue still persists - COMPLETED

Versions

  • Module version [Required]: 18.20.5

  • Terraform version:

terraform { required_version = ">= 1.3.7"

required_providers { aws = ">= 4.12.0" kubernetes = { source = "hashicorp/kubernetes" version = "2.11.0" } } }

Reproduction Code [Required]

Steps to reproduce the behavior:

main.tf

################################################
#          KMS CLUSTER ENCRYPTION KEY          #
################################################
module "kms" {
  source  = "terraform-aws-modules/kms/aws"
  version = "1.1.0"

  aliases               = ["eks/${var.cluster_name}_test"]
  description           = "${var.cluster_name} cluster encryption key"
  enable_default_policy = true
  key_owners            = [data.aws_caller_identity.current.arn]

  tags = local.tags
}

##################################
#       KUBERNETES CLUSTER       #
##################################
module "primary" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.13.1"

  cluster_name                    = var.cluster_name
  cluster_version                 = var.cluster_version
  cluster_endpoint_private_access = var.cluster_endpoint_private_access
  cluster_endpoint_public_access  = var.cluster_endpoint_public_access

  create_kms_key = false
  cluster_encryption_config = {
    resources        = ["secrets"]
    provider_key_arn = module.kms.key_arn
  }

  create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
  manage_aws_auth_configmap  = true
  aws_auth_roles             = var.aws_auth_roles

  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  eks_managed_node_group_defaults = {
    ami_type       = var.ami_type
    disk_size      = var.disk_size
    instance_types = var.instance_types

    iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
  }

  eks_managed_node_groups = {
    primary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type  = "ON_DEMAND"
      bootstrap_extra_args = "--kubelet-extra-args '--node-labels=geeiq/node-type=worker'"
    }
    secondary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type = "SPOT"
      bootstrap_extra_args = "--kubelet-extra-args '--node-labels=geeiq/node-type=worker'"
    }
  }

  cluster_addons = {
    aws-ebs-csi-driver = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    aws-guardduty-agent = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    coredns = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    kube-proxy = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    vpc-cni = {
      most_recent       = true
      resolve_conflicts = "OVERWRITE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
  }

  tags = {
    repo = "https://github.com/impinj-di/terraform-aws-eks-primary"
  }
}

we ingest this as a module within our repository in an area about 2 directories into it:

terraform {
  required_version = ">= 1.1.5"

  required_providers {
    aws = ">= 4.12.0"
    helm = {
      source  = "hashicorp/helm"
      version = "2.9.0"
    }
  }
}

data "aws_eks_cluster_auth" "primary" {
  name = module.primary.cluster_name
}

data "aws_eks_cluster" "primary" {
  name = module.primary.cluster_name
}

terraform {
  backend "s3" {
    bucket  = "impinj-canary-terraform"
    key     = "terraform-aws-eks-primary.tfstate"
    region  = "us-west-2"
    encrypt = true
  }
}

data "aws_iam_account_alias" "current" {}

data "terraform_remote_state" "route53" {
  backend = "s3"
  config = {
    bucket = "impinj-canary-terraform"
    key    = "route53.tfstate"
    region = "us-west-2"
  }
}

data "terraform_remote_state" "s3" {
  backend = "s3"
  config = {
    bucket = "impinj-canary-terraform"
    key    = "s3.tfstate"
    region = "us-west-2"
  }
}

data "terraform_remote_state" "subnets" {
  backend = "s3"
  config = {
    bucket = "impinj-canary-terraform"
    key    = "vpc-shared.tfstate"
    region = "us-west-2"
  }
}

data "aws_vpc" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared"]
  }
}

provider "aws" {
  alias  = "sec"
  region = "us-west-2"
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.primary.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.primary.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.primary.token
}

##################################
#       KUBERNETES CLUSTER       #
##################################
module "primary" {
  source = "../../"

  cluster_version = "1.26"

  # these must be set to 'true' on initial deployment and then set
  # to false so that destroy works properly
  create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
  iam_role_attach_cni_policy = var.iam_role_attach_cni_policy

  vpc_id     = data.aws_vpc.shared.id
  subnet_ids = data.terraform_remote_state.subnets.outputs.private_subnets.*.id

  instance_types = ["t2.xlarge"]
  disk_size      = 20

  aws_auth_roles               = local.aws_auth_roles
  cert_manager_hosted_zone     = data.terraform_remote_state.route53.outputs.account_zone_id
  db_import_bucket_arn         = data.terraform_remote_state.s3.outputs.impinjcanary_test_arn
  external_dns_hosted_zone     = data.terraform_remote_state.route53.outputs.account_zone_id
  rf_probe_reporter_bucket_arn = data.terraform_remote_state.s3.outputs.impinjcanary_test_arn
}

Expected behavior

cluster_addons come up.

Actual behavior

cluster_addons do not come up.

Terminal Output Screenshot(s)

β”‚ Error: unexpected EKS Add-On (primary:vpc-cni) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚ 
β”‚   with module.primary.module.primary.aws_eks_addon.this["vpc-cni"],
β”‚   on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β”‚  382: resource "aws_eks_addon" "this" {
β”‚ 
β•΅
β•·
β”‚ Error: unexpected EKS Add-On (primary:aws-guardduty-agent) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚ 
β”‚   with module.primary.module.primary.aws_eks_addon.this["aws-guardduty-agent"],
β”‚   on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β”‚  382: resource "aws_eks_addon" "this" {
β”‚ 
β•΅
β•·
β”‚ Error: unexpected EKS Add-On (primary:aws-ebs-csi-driver) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚ 
β”‚   with module.primary.module.primary.aws_eks_addon.this["aws-ebs-csi-driver"],
β”‚   on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β”‚  382: resource "aws_eks_addon" "this" {
β”‚ 
β•΅
β•·
β”‚ Error: unexpected EKS Add-On (primary:coredns) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚ 
β”‚   with module.primary.module.primary.aws_eks_addon.this["coredns"],
β”‚   on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β”‚  382: resource "aws_eks_addon" "this" {

Additional context

rbarrett-impinj avatar May 22 '23 20:05 rbarrett-impinj

it looks like your cluster has no registered nodes in "ready" state. I had the same issues with the codedns module. The issues is deeper, addons depends on the node profiles configuration modules but them finised faster then nodes registered on the EKS and it cause helthcheck failure of the addon. In my case newly created EKS within self managed nod group takes about 5min to register nodes, but codedn already daployed and got degradated status, and terraform is waiting about 15 min more just to get the green status of the addon. From another hand the pods of coredns start in 5-10 sec as soon an first EKS node become ready.

Long text short: it will be grate to have some check in the part where the module create self-managed node group to make sure created nodes successfully registered before installing addons.

artem-kosenko avatar May 26 '23 10:05 artem-kosenko

yeah I had to take out the aws-guardduty-agent add_on, and it worked, so I think this module only supports limited add_ons and not all EKS add_ons.

rbarrett-impinj avatar Jun 05 '23 18:06 rbarrett-impinj

addons are simply passed through to the underlying https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_addon resource

you will need to ensure that any pre-requisites are in place for the respective addons (either EKS addon or partner addons from the AWS Marketplace) https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html

bryantbiggs avatar Jun 05 '23 18:06 bryantbiggs

closing for now since this does not appear to be a module issue

bryantbiggs avatar Jun 07 '23 00:06 bryantbiggs

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jul 07 '23 02:07 github-actions[bot]