terraform-aws-eks
terraform-aws-eks copied to clipboard
Using own networking; nodes are unable to join cluster
Description
I have two (eks, eks-home) clusters I am setting up, one using the VPC from the included VPC module, the other using VPC and Subnets imported from existing resources.
- [X] β I have searched the open/closed issues and my issue is not listed.
Versions
-
Module version [Required]: 20.2.1
-
Terraform version: 1.7.3
-
Provider version(s):
- provider registry.terraform.io/hashicorp/aws v5.35.0
- provider registry.terraform.io/hashicorp/cloudinit v2.3.3
- provider registry.terraform.io/hashicorp/random v3.5.1
- provider registry.terraform.io/hashicorp/time v0.10.0
- provider registry.terraform.io/hashicorp/tls v4.0.5
Reproduction Code [Required]
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.1"
name = "education-vpc"
cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.2.1"
cluster_name = local.cluster_name
cluster_version = "1.29"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
}
eks_managed_node_groups = {
one = {
name = "node-group-1"
instance_types = ["t3.small"]
min_size = 1
max_size = 3
desired_size = 2
}
two = {
name = "node-group-2"
instance_types = ["t3.small"]
min_size = 1
max_size = 2
desired_size = 1
}
}
}
module "eks-home" {
source = "terraform-aws-modules/eks/aws"
version = "20.2.1"
cluster_name = "${local.cluster_name}-home"
cluster_version = "1.29"
vpc_id = data.aws_vpc.core.id
subnet_ids = ["subnet-04f23eb8f54a20e62", "subnet-0b13245e8c4df7d08", "subnet-00d031043f0b62c5c"]
cluster_endpoint_public_access = true
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
}
eks_managed_node_groups = {
one = {
name = "node-group-1-home"
instance_types = ["t3.small"]
min_size = 1
max_size = 3
desired_size = 2
}
two = {
name = "node-group-2-home"
instance_types = ["t3.small"]
min_size = 1
max_size = 2
desired_size = 1
}
}
}
Steps to reproduce the behavior:
When the above HCL runs it builds two clusters, one with a fresh VPC/subnets/sgs/nacls/etc., the other from my existing VPC.
The eks-home cluster nodes are unable to join the cluster. I can't seem to figure out what I am missing and looking for help.
Expected behavior
I expect both clusters to work, just one on an existing VPC.
Actual behavior
eks cluster with fresh VPC/subnets works, eks-home using the existing VPC does not work.
Terminal Output Screenshot(s)
β Error: waiting for EKS Node Group (education-eks-home:node-group-1-home-20240213193545975900000001) create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: i-064d226aa7e41f356, i-0a23e706eb655fd18: NodeCreationFailure: Instances failed to join the kubernetes cluster
β
β with module.eks-home.module.eks_managed_node_group["one"].aws_eks_node_group.this[0],
β on .terraform/modules/eks-home/modules/eks-managed-node-group/main.tf line 308, in resource "aws_eks_node_group" "this":
β 308: resource "aws_eks_node_group" "this" {
β
Additional context
EKS VPC Details
Network ACL:
acl-09499c0beba579e3a
Ingress:
100 ALL:ALL 0/0 Allow
101 ALL:ALL ::/0 Allow
Egress:
100 ALL:ALL 0/0 Allow
101 ALL:ALL ::/0 Allow
NAT Gateway:
nat-0334dc8ee1978a397 Public
Route Table:
rtb-0ec0db41ee880c262
Routes:
10.0.0.0/16 local
0.0.0.0/0 nat-0334dc8ee1978a397
Subnets:
subnet-066405cc0c25e6179
subnet-030f7f95759f01724
subnet-052c037ab22f00738
Subnets
subnet-066405cc0c25e6179
rtb-0ec0db41ee880c262
acl-09499c0beba579e3a
subnet-030f7f95759f01724
rtb-0ec0db41ee880c262
acl-09499c0beba579e3a
subnet-066405cc0c25e6179
rtb-0ec0db41ee880c262
acl-09499c0beba579e3a
Security Groups
sg-0f914078f036f4f41
Ingress:
ALL:ALL sg-0f914078f036f4f41
Egress:
ALL:ALL 0.0.0.0/0
sg-0073293245dc624a0
Ingress:
TCP:443 sg-02db6c4108fad1284
sg-02db6c4108fad1284
Ingress:
TCP:53 sg-02db6c4108fad1284
UDP:53 sg-02db6c4108fad1284
TCP:443 sg-0073293245dc624a0
TCP:1025-* sg-02db6c4108fad1284
TCP:4443 sg-0073293245dc624a0
TCP:6443 sg-0073293245dc624a0
TCP:8443 sg-0073293245dc624a0
TCP:9443 sg-0073293245dc624a0
TCP:10250 sg-0073293245dc624a0
Egress:
ALL:ALL 0.0.0.0/0
EKS Home VPC Details
Network ACL:
acl-0d9f7b0672432b622
Ingress:
100 ALL:ALL 0/0 Allow
101 ALL:ALL ::/0 Allow
Egress:
100 ALL:ALL 0/0 Allow
101 ALL:ALL ::/0 Allow
NAT Gateways:
nat-041d97a1b56381127 Public
Route Table:
rtb-0c3e8df22f7257041
Routes:
10.0.0.0/16 local
0.0.0.0/0 nat-041d97a1b56381127
Subnets:
subnet-04f23eb8f54a20e62
subnet-00d031043f0b62c5c
subnet-0b13245e8c4df7d08
Subnets
subnet-04f23eb8f54a20e62
rtb-0c3e8df22f7257041
acl-0d9f7b0672432b622
subnet-0b13245e8c4df7d08
rtb-0c3e8df22f7257041
acl-0d9f7b0672432b622
subnet-00d031043f0b62c5c
rtb-0c3e8df22f7257041
acl-0d9f7b0672432b622
Security Groups
sg-057bf55b6cbf41ecd
Ingress:
ALL:ALL sg-057bf55b6cbf41ecd
Egress:
ALL:ALL 0.0.0.0/0
sg-098ae5c5728e4b68b
Ingress:
TCP:443 sg-02e487176f2b42374
sg-02e487176f2b42374
Ingress:
TCP:53 sg-02e487176f2b42374
UDP:53 sg-02e487176f2b42374
TCP:443 sg-098ae5c5728e4b68b
TCP:1025-* sg-02e487176f2b42374
TCP:4443 sg-098ae5c5728e4b68b
TCP:6443 sg-098ae5c5728e4b68b
TCP:8443 sg-098ae5c5728e4b68b
TCP:9443 sg-098ae5c5728e4b68b
TCP:10250 sg-098ae5c5728e4b68b
Egress:
ALL:ALL 0.0.0.0/0
I'm having a similar issue,
β Error: waiting for EKS Node Group (my-cluser:my-eks-managed-node-group-20240217064513134700000005) create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: i-0738caa5a870a1536, i-0ac0190c2a8b18923: NodeCreationFailure: Instances failed to join the kubernetes cluster
β
β with module.my-cluster.module.my-eks.module.eks_managed_node_group["eks_mng"].aws_eks_node_group.this[0],
β on .terraform/modules/my-cluster.my-eks/modules/eks-managed-node-group/main.tf line 308, in resource "aws_eks_node_group" "this":
β 308: resource "aws_eks_node_group" "this" {
Did you find anything in CloudTrail? I'm starting to wonder if AWS changed something on their side of the API.
CloudTrail is telling me the instance profile name is invalid,
"errorMessage": "You must use a valid fully-formed launch template. Value (eks-d8c6d9f9-90bb-c537-cb25-40c5255cf213) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name",
even though it seems fine. I can dry-run an instance with the launch template...
(0)$ aws ec2 run-instances --launch-template LaunchTemplateName=eks-d8c6d9f9-90bb-c537-cb25-40c5255cf213,Version='1' --dry-run --subnet-id subnet-0e18a11f29685192e --profile pit-dev
An error occurred (DryRunOperation) when calling the RunInstances operation: Request would have succeeded, but DryRun flag is set.
I'm going to write my own issue with more details and I'll link it here.
I'm also getting the same error while using an existing private vpc+subnet. Any ideas how to fix it? It's the first time I'm using this module
Error: waiting for EKS Node Group (Anomalo_EKS:general-20240223232200053200000001) create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: i-0043619eeebf2f8dd: NodeCreationFailure: Instances failed to join the kubernetes cluster β β with module.eks.module.eks_managed_node_group["general"].aws_eks_node_group.this[0], β on .terraform/modules/eks/modules/eks-managed-node-group/main.tf line 308, in resource "aws_eks_node_group" "this": β 308: resource "aws_eks_node_group" "this" {
Info?
Just started looking at the v20.x terraform-aws-modules/eks/aws module. I too cannot add managed node groups to an existing vpc. I ssh'd into the ec2 instance of the node group and there's errors about containerd and the cni, along with 403's trying to pull the eks containers from ECR. I have the correct policies attached to the role.
v19 of the module has no issues with managed node groups with existing vpcs.
following up on my comment, I solved the issue where the nodes won't join the cluster in an existing VPC. I found it when reviewing the launch template and the EC2 user data was blank. Then comparing to old node groups sg, I saw the new nodes were missing the primary security group.
in the managed node group config, I need to explicitly set enable_bootstrap_user_data=true and attach_cluster_primary_security_group=true.
eks_managed_node_groups = {
# Managed Node groups with minimum config
group1 = {
name = "group1"
use_name_prefix = true
enable_efa_support = false
ami_type = "AL2_x86_64"
ami_id = data.aws_ami.eks_default.image_id
cluster_name = local.name
enable_bootstrap_user_data = true
attach_cluster_primary_security_group = true
instance_types = ["m5.xlarge"]
min_size = 1
max_size = 4
desired_size = 1
create_iam_role = true
disk_size = 50
update_config = {
max_unavailable_percentage = 30
}
subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnet_ids
}
yes, on managed nodegroups, if you use a custom AMI you must provide the user data and we make that easier for you by setting the flag enable_bootstrap_user_data.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/user_data.md
Also, this would be a minimum config for what you provided - a lot of what you provided is already the default or not required:
eks_managed_node_groups = {
group1 = {
ami_type = "AL2_x86_64"
ami_id = data.aws_ami.eks_default.image_id
enable_bootstrap_user_data = true
instance_types = ["m5.xlarge"]
min_size = 1
max_size = 4
desired_size = 1
subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnet_ids
}
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.