terraform-aws-eks
terraform-aws-eks copied to clipboard
Cluster_addons do not come up
Description
EKS cluster_addons take forever to actually create, if I click in the console it takes about 5-10 minutes, but with this particular terrafrom module it seems like it never fully works.
- [x] β I have searched the open/closed issues and my issue is not listed.
β οΈ Note
Before you submit an issue, please perform the following first:
- Remove the local
.terraformdirectory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/-COMPLETED - Re-initialize the project root to pull down modules:
terraform init-COMPLETED - Re-attempt your terraform plan or apply and check if the issue still persists -
COMPLETED
Versions
-
Module version [Required]: 18.20.5
-
Terraform version:
terraform { required_version = ">= 1.3.7"
required_providers { aws = ">= 4.12.0" kubernetes = { source = "hashicorp/kubernetes" version = "2.11.0" } } }
Reproduction Code [Required]
Steps to reproduce the behavior:
main.tf
################################################
# KMS CLUSTER ENCRYPTION KEY #
################################################
module "kms" {
source = "terraform-aws-modules/kms/aws"
version = "1.1.0"
aliases = ["eks/${var.cluster_name}_test"]
description = "${var.cluster_name} cluster encryption key"
enable_default_policy = true
key_owners = [data.aws_caller_identity.current.arn]
tags = local.tags
}
##################################
# KUBERNETES CLUSTER #
##################################
module "primary" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.13.1"
cluster_name = var.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_private_access = var.cluster_endpoint_private_access
cluster_endpoint_public_access = var.cluster_endpoint_public_access
create_kms_key = false
cluster_encryption_config = {
resources = ["secrets"]
provider_key_arn = module.kms.key_arn
}
create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
manage_aws_auth_configmap = true
aws_auth_roles = var.aws_auth_roles
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
eks_managed_node_group_defaults = {
ami_type = var.ami_type
disk_size = var.disk_size
instance_types = var.instance_types
iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
}
eks_managed_node_groups = {
primary = {
min_size = 1
max_size = 5
desired_size = 1
capacity_type = "ON_DEMAND"
bootstrap_extra_args = "--kubelet-extra-args '--node-labels=geeiq/node-type=worker'"
}
secondary = {
min_size = 1
max_size = 5
desired_size = 1
capacity_type = "SPOT"
bootstrap_extra_args = "--kubelet-extra-args '--node-labels=geeiq/node-type=worker'"
}
}
cluster_addons = {
aws-ebs-csi-driver = {
most_recent = true
resolve_conflicts = "OVERWRITE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
aws-guardduty-agent = {
most_recent = true
resolve_conflicts = "OVERWRITE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
coredns = {
most_recent = true
resolve_conflicts = "OVERWRITE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
kube-proxy = {
most_recent = true
resolve_conflicts = "OVERWRITE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
vpc-cni = {
most_recent = true
resolve_conflicts = "OVERWRITE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
}
tags = {
repo = "https://github.com/impinj-di/terraform-aws-eks-primary"
}
}
we ingest this as a module within our repository in an area about 2 directories into it:
terraform {
required_version = ">= 1.1.5"
required_providers {
aws = ">= 4.12.0"
helm = {
source = "hashicorp/helm"
version = "2.9.0"
}
}
}
data "aws_eks_cluster_auth" "primary" {
name = module.primary.cluster_name
}
data "aws_eks_cluster" "primary" {
name = module.primary.cluster_name
}
terraform {
backend "s3" {
bucket = "impinj-canary-terraform"
key = "terraform-aws-eks-primary.tfstate"
region = "us-west-2"
encrypt = true
}
}
data "aws_iam_account_alias" "current" {}
data "terraform_remote_state" "route53" {
backend = "s3"
config = {
bucket = "impinj-canary-terraform"
key = "route53.tfstate"
region = "us-west-2"
}
}
data "terraform_remote_state" "s3" {
backend = "s3"
config = {
bucket = "impinj-canary-terraform"
key = "s3.tfstate"
region = "us-west-2"
}
}
data "terraform_remote_state" "subnets" {
backend = "s3"
config = {
bucket = "impinj-canary-terraform"
key = "vpc-shared.tfstate"
region = "us-west-2"
}
}
data "aws_vpc" "shared" {
filter {
name = "tag:Name"
values = ["shared"]
}
}
provider "aws" {
alias = "sec"
region = "us-west-2"
}
provider "kubernetes" {
host = data.aws_eks_cluster.primary.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.primary.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.primary.token
}
##################################
# KUBERNETES CLUSTER #
##################################
module "primary" {
source = "../../"
cluster_version = "1.26"
# these must be set to 'true' on initial deployment and then set
# to false so that destroy works properly
create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
vpc_id = data.aws_vpc.shared.id
subnet_ids = data.terraform_remote_state.subnets.outputs.private_subnets.*.id
instance_types = ["t2.xlarge"]
disk_size = 20
aws_auth_roles = local.aws_auth_roles
cert_manager_hosted_zone = data.terraform_remote_state.route53.outputs.account_zone_id
db_import_bucket_arn = data.terraform_remote_state.s3.outputs.impinjcanary_test_arn
external_dns_hosted_zone = data.terraform_remote_state.route53.outputs.account_zone_id
rf_probe_reporter_bucket_arn = data.terraform_remote_state.s3.outputs.impinjcanary_test_arn
}
Expected behavior
cluster_addons come up.
Actual behavior
cluster_addons do not come up.
Terminal Output Screenshot(s)
β Error: unexpected EKS Add-On (primary:vpc-cni) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β
β with module.primary.module.primary.aws_eks_addon.this["vpc-cni"],
β on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β 382: resource "aws_eks_addon" "this" {
β
β΅
β·
β Error: unexpected EKS Add-On (primary:aws-guardduty-agent) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β
β with module.primary.module.primary.aws_eks_addon.this["aws-guardduty-agent"],
β on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β 382: resource "aws_eks_addon" "this" {
β
β΅
β·
β Error: unexpected EKS Add-On (primary:aws-ebs-csi-driver) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β
β with module.primary.module.primary.aws_eks_addon.this["aws-ebs-csi-driver"],
β on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β 382: resource "aws_eks_addon" "this" {
β
β΅
β·
β Error: unexpected EKS Add-On (primary:coredns) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β
β with module.primary.module.primary.aws_eks_addon.this["coredns"],
β on .terraform/modules/primary.primary/main.tf line 382, in resource "aws_eks_addon" "this":
β 382: resource "aws_eks_addon" "this" {
Additional context
it looks like your cluster has no registered nodes in "ready" state. I had the same issues with the codedns module. The issues is deeper, addons depends on the node profiles configuration modules but them finised faster then nodes registered on the EKS and it cause helthcheck failure of the addon. In my case newly created EKS within self managed nod group takes about 5min to register nodes, but codedn already daployed and got degradated status, and terraform is waiting about 15 min more just to get the green status of the addon. From another hand the pods of coredns start in 5-10 sec as soon an first EKS node become ready.
Long text short: it will be grate to have some check in the part where the module create self-managed node group to make sure created nodes successfully registered before installing addons.
yeah I had to take out the aws-guardduty-agent add_on, and it worked, so I think this module only supports limited add_ons and not all EKS add_ons.
addons are simply passed through to the underlying https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_addon resource
you will need to ensure that any pre-requisites are in place for the respective addons (either EKS addon or partner addons from the AWS Marketplace) https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html
closing for now since this does not appear to be a module issue
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.