terraform-aws-eks
terraform-aws-eks copied to clipboard
CoreDNS causing an issue to InsufficientNumberOfReplicas - The add-on is unhealthy because it doesn't have the desired number of replicas.
Description
I'm encountering an issue with CoreDNS related to an insufficient number of replicas. The add-on is currently flagged as unhealthy due to the shortfall in the desired number of replicas. This issue arises while utilizing the EKS module in conjunction with a custom AMI.
I am using below eks module
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = local.name
cluster_version = local.cluster_version
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
enable_irsa = true
enable_cluster_creator_admin_permissions = true
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
cis_ami = {
instance_types = ["m5.large"]
ami_id = data.aws_ami.image.id
# # This will ensure the bootstrap user data is used to join the node
enable_bootstrap_user_data = true
iam_role_attach_cni_policy = true
min_size = 1
max_size = 6
desired_size = 4
}
}
# EKS Addons
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
# aws-ebs-csi-driver = {
# service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
# }
vpc-cni = {
before_compute = true
most_recent = true
configuration_values = jsonencode({
env = {
ENABLE_PREFIX_DELEGATION = "true"
WARM_PREFIX_TARGET = "1"
}
})
}
}
tags = local.tags
}
Due to this the pods are crashing - here is the out put behaviour
β― k get po -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-54f58989fd-hj848 0/1 CrashLoopBackOff 17 (3m49s ago) 70m
aws-load-balancer-controller-54f58989fd-k2qzn 0/1 CrashLoopBackOff 17 (4m1s ago) 70m
aws-node-9bkt8 2/2 Running 0 71m
aws-node-psdvq 2/2 Running 0 71m
aws-node-qmhxg 2/2 Running 0 71m
aws-node-xl99d 2/2 Running 0 71m
cluster-autoscaler-aws-cluster-autoscaler-848fbf899c-8nxls 0/1 CrashLoopBackOff 16 (90s ago) 66m
coredns-557586b4b9-hnlg5 0/1 Running 0 64m
coredns-6f99ddbc54-pkltm 0/1 Running 0 56m
coredns-6f99ddbc54-xw65l 0/1 Running 0 56m
ebs-csi-controller-576c8d5c58-4q6vc 6/6 Running 0 69m
ebs-csi-controller-576c8d5c58-qk6m9 6/6 Running 0 70m
ebs-csi-node-5fztg 1/3 CrashLoopBackOff 40 (3m12s ago) 71m
ebs-csi-node-7tnrt 1/3 CrashLoopBackOff 41 (2m47s ago) 71m
ebs-csi-node-bmpqh 2/3 CrashLoopBackOff 41 (3m1s ago) 71m
ebs-csi-node-splqt 1/3 CrashLoopBackOff 39 (3m48s ago) 71m
kube-proxy-76gkn 1/1 Running 0 71m
kube-proxy-gkhcn 1/1 Running 0 71m
kube-proxy-hqxw7 1/1 Running 0 71m
kube-proxy-kxfds 1/1 Running 0 71m
Terminal Output Screenshot(s)
@bryantbiggs - JFYI, when try to view the coredns logs i am noting this error.
β― k logs -f coredns-xxxx-hnlg5 -n kube-system
Error from server: Get "https://1x.xx.xx.xx:10250/containerLogs/kube-system/coredns-xxx-hnlg5/coredns?follow=true": dial tcp xx.xx.xx.xx:10250: i/o timeout
@bryantbiggs - To resolve the issue, I need to add iptables entries to enable incoming calls from Kubernetes, as I'm using a custom AMI.
My goal is to determine how to modify the Terraform EKS module to override the Bootstrap command.
for exmaple:
document ref: https://aws.amazon.com/blogs/containers/building-amazon-linux-2-cis-benchmark-amis-for-amazon-eks/
overrideBootstrapCommand: |
#!/bin/bash
set -ex
iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT
/etc/eks/bootstrap.sh $CLUSTER_NAME
I am utilizing terraform eks module and customize node group as below
eks_managed_node_groups = {
cis_ami = {
instance_types = ["m5.large"]
ami_id = data.aws_ami.image.id
# # This will ensure the bootstrap user data is used to join the node
enable_bootstrap_user_data = true
iam_role_attach_cni_policy = true
min_size = 1
max_size = 6
desired_size = 4
}
}
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L204
enable_bootstrap_user_data = true
pre_bootstrap_user_data = <<-EOT
export FOO=bar
EOT
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.