terraform-aws-eks
terraform-aws-eks copied to clipboard
Multiple security groups are created with the kubernetes.io/cluster tag
Description
The issue is exactly the same as #1810, but for EKS managed node groups, with the same workaround (and presumably the same fix)
- [x] ✋ I have searched the open/closed issues and my issue is not listed.
Versions
-
18.10.1
-
Terraform version:
Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v3.72.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.3.2
+ provider registry.terraform.io/hashicorp/local v2.2.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0
+ provider registry.terraform.io/oboukili/argocd v2.2.8
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1
Your version of Terraform is out of date! The latest version is 1.1.7. You can update by downloading from https://www.terraform.io/downloads.html
Reproduction Code [Required]
Steps to reproduce the behaviour:
Create a cluster with this module, using default values and containing at least one eks_managed_node_group
Possible workaround:
Disable creation of either the node_group or shared node group security groups through module variables and create a separate security group outside this module.
hi @blakepettersson - the EKS managed node group sub-module security group does not have these tags currently
@bryantbiggs so it seems :facepalm:. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the kubernetes.io/cluster/${var.cluster_name}
set, which conflicts with the tag which is set by aws_security_group.node
. I'll setup a proper repro for this.
Here's a minimal repro:
locals {
vpc_id = "a vpc id"
aws_region = data.aws_region.current.name
current_account = data.aws_caller_identity.current.account_id
}
provider "aws" {
region = "eu-north-1"
}
data "aws_region" "current" {}
data "aws_caller_identity" "current" {}
data "aws_subnets" "all" {
filter {
name = "vpc-id"
values = [local.vpc_id]
}
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "18.10.1"
cluster_name = "test"
cluster_version = "1.21"
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
vpc_id = local.vpc_id
subnet_ids = data.aws_subnets.all.ids
eks_managed_node_groups = {
default = {
medium = {
instance_types = ["t3.medium"]
desired_size = 3
max_size = 10
min_size = 3
disk_type = "gp3"
}
}
}
enable_irsa = true
}
Running terraform apply
will create 3 security groups, 2 of which are created by this module and 1 by EKS itself.
Both of these security groups have the tag kubernetes.io/cluster/test: owned
.
Modified from examples/complete removing all external modules and keeping 1 inline managed node group only.
nodegroup configuration
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.20.5"
.
.
.
# EKS Managed Node Group(s)
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
disk_size = 10
instance_types = ["t2.small", "t3.small", "t3a.small"]
attach_cluster_primary_security_group = true
vpc_security_group_ids = [aws_security_group.additional.id]
}
eks_managed_node_groups = {
green = {
min_size = 1
max_size = 10
desired_size = 1
instance_types = ["t3.small"]
capacity_type = "SPOT"
labels = {
Environment = "test"
GithubRepo = "terraform-aws-eks"
GithubOrg = "terraform-aws-modules"
}
taints = {
# dedicated = {
# key = "dedicated"
# value = "gpuGroup"
# effect = "NO_SCHEDULE"
# }
}
update_config = {
max_unavailable_percentage = 50 # or set `max_unavailable`
}
tags = {
ExtraTag = "example"
}
}
}
kubectl describe service
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 97s (x6 over 4m16s) service-controller Ensuring load balancer
Warning SyncLoadBalancerFailed 96s (x6 over 4m14s) service-controller Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-07d843b5953ca7412; ensure only the k8s security group is tagged; the tagged groups were sg-0bc8599e71f8a54a9(my-testing-k8s-node-20220419213507462100000009) sg-0b189fb8969ff5bd4(eks-cluster-sg-my-testing-k8s-987542792)
This workaround solves the problem. Pass it to the module "eks"
# Temp workaround for bug : double owned tag
# https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1810
node_security_group_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
Notice that taint configuration is also disabled since it caused rejection in scheduler with core-dns deployment.
taints = {
# dedicated = {
# key = "dedicated"
# value = "gpuGroup"
# effect = "NO_SCHEDULE"
# }
}
@bryantbiggs so it seems 🤦. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the
kubernetes.io/cluster/${var.cluster_name}
set, which conflicts with the tag which is set byaws_security_group.node
. I'll setup a proper repro for this.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/dc8a6eecddc2c6957ba309a939ee46f39b946461/node_groups.tf#L160
I'm also experiencing an issue with this after installing the aws-load-balancer-controller. I get the following error from there.
{"level":"error","ts":1651137191.6512694,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-echoserv-echoserv-aaf854977c","namespace":"echoserver","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/geeiq-test-k8s for eni eni-0605625aa92fe670e, got: [sg-0cc7f9db258d4e3f8 sg-0e95a58276d2c4ae5]"}
After checking the console. i can see i have 2 sgs with the same tag "kubernetes.io/cluster/<cluster_name>" = "owned"
.
Im also using eks managed node groups.
@kaykhancheckpoint check the last couple posts for the workaround from leiarenee
This workaround solved it for me. (https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1810) (Repeating for brevity)
Module 'eks'
node_security_group_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
@diginc @leiarenee yup, thank you
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This workaround did not work for us.
@rcmurphy / @mr-kelly can you try:
cluster_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
Where ${local.name}
is the name of your cluster
~~I can confirm we've tried that. It did not work.~~
~I can confirm we've tried that. It did not work.~
apologies - does that mean that it does work for your case or not?
It means my ops engineer told me he did it and then realized that you would actually posted a slightly different command, So we never actually tried it. We are used to different work around though (In this case we need to set a kernel parameter, so are using a start up container that’s privileged to do it instead.)
On Jun 3, 2022, at 20:22, Bryant Biggs @.***> wrote:
I can confirm we've tried that. It did not work.
apologies - does that mean that it does work for your case or not?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
@rcmurphy / @mr-kelly can you try:
cluster_tags = { "kubernetes.io/cluster/${local.name}" = null }
Where
${local.name}
is the name of your cluster
In my Karpenter Example, this doesn't work.

And these stop create security group doesn't work so:
create_node_security_group = false
create_cluster_security_group = false
I am still trying....
In this offical doc: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
AWS says :

And in my karpenter
example,
kubernetes.io/cluster/cluster-name=owned
appears twice.
After digging in to this, here is what I found:
If you use attach_cluster_primary_security_group = true
, then you will want to set
node_security_group_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
Why - the primary security group created by the EKS service automatically tags the cluster's primary security group. I tried to use
cluster_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
To "hide" this tag and remove the conflict, but this route does not work. Its safe to say, if you want to attach the cluster's primary security group to your nodes, you need to disable the tag on the node shared security group.
Its not a matter of just having multiple security groups with the tag key:value combo, its only a concern when multiple security groups with the tag key:value combo are ATTACHED to your node(s).
You can refer to this example which did not encounter any issues provisioning the ALB controller chart https://github.com/clowdhaus/eks-reference-architecture/tree/main/karpenter/us-east-1
Let me know if there is anything else I missed or any other questions pertaining to this issue.
If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with "kubernetes.io/cluster/$clustername" = owned
.
And that would mean you basically have to use attach_cluster_primary_security_group = true
, because the clusters primary security group will always have the problematic tag, since you state that
cluster_tags = {
"kubernetes.io/cluster/${local.name}" = null
}
does not work.
An alternative would be to set create_security_group = false
in the node group, which seems like the equivalent of setting attach_cluster_primary_security_group = true
.
Do I understand this correctly?
If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with
"kubernetes.io/cluster/$clustername" = owned
. And that would mean you basically have to useattach_cluster_primary_security_group = true
, because the clusters primary security group will always have the problematic tag, since you state thatcluster_tags = { "kubernetes.io/cluster/${local.name}" = null }
does not work.
An alternative would be to set
create_security_group = false
in the node group, which seems like the equivalent of settingattach_cluster_primary_security_group = true
.Do I understand this correctly?
No - the security group and tag in question only come into play when they are attached to nodes. So you can have 40+ security groups in your account with this tag without issue, but as soon as you attach more than one of those security groups to node(s) in your cluster, then its a conflict
That is very interesting.
My clusters were built without attach_cluster_primary_security_group = true
.
Yet the aws-lb-controller complains as soon as I add the tag to any second security group.
Hi, @kaykhancheckpoint how did you actually deploy and enabled the "load balancer controller"?
Running into what appears to be a similar issue, I think I've been able to work around it by using create_cluster_primary_security_group_tags = false
.
The problem appears to be that there are more than 1 security group that are tagged with "kubernetes.io/cluster/${var.cluster_name}" = "owned"
. Setting that variable to false
seems to at least stop Karpenter nodes from picking up the additional security group which stops things like the aws-lb-controller from complaining about ENIs with multiple security groups.
This may be a clue when running an apply:
module.eks.aws_ec2_tag.cluster_primary_security_group["karpenter.sh/discovery"] will be destroyed
Note that existing Karpenter instances need to be replaced for this tag change to take effect.
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
I think this issue has probably been closed prematurely; I just hit it as well with this template.
The issue still exists with module terraform-aws-modules/eks/aws:v18.26.6. The proposed workaround resolved my problem.
+1
Stepped over this as well with 18.26.6, using attach_cluster_primary_security_group = true
.
Results was the inability to have the ELB working, having two security groups with "kubernetes.io/cluster/<cluster_name>" = "owned"
.
Using create_cluster_primary_security_group_tags = false
, but honestly atm i'm not able to say if this is a proper setup or just a workaround.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.