terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Multiple security groups are created with the kubernetes.io/cluster tag

Open blakepettersson opened this issue 2 years ago • 24 comments

Description

The issue is exactly the same as #1810, but for EKS managed node groups, with the same workaround (and presumably the same fix)

  • [x] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • 18.10.1

  • Terraform version:

Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v3.72.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.3.2
+ provider registry.terraform.io/hashicorp/local v2.2.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0
+ provider registry.terraform.io/oboukili/argocd v2.2.8
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1

Your version of Terraform is out of date! The latest version is 1.1.7. You can update by downloading from https://www.terraform.io/downloads.html

Reproduction Code [Required]

Steps to reproduce the behaviour:

Create a cluster with this module, using default values and containing at least one eks_managed_node_group

Possible workaround:

Disable creation of either the node_group or shared node group security groups through module variables and create a separate security group outside this module.

blakepettersson avatar Apr 04 '22 14:04 blakepettersson

hi @blakepettersson - the EKS managed node group sub-module security group does not have these tags currently

bryantbiggs avatar Apr 04 '22 14:04 bryantbiggs

@bryantbiggs so it seems :facepalm:. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the kubernetes.io/cluster/${var.cluster_name} set, which conflicts with the tag which is set by aws_security_group.node. I'll setup a proper repro for this.

blakepettersson avatar Apr 04 '22 14:04 blakepettersson

Here's a minimal repro:

locals {
  vpc_id          = "a vpc id"
  aws_region      = data.aws_region.current.name
  current_account = data.aws_caller_identity.current.account_id
}

provider "aws" {
  region = "eu-north-1"
}

data "aws_region" "current" {}

data "aws_caller_identity" "current" {}

data "aws_subnets" "all" {
  filter {
    name   = "vpc-id"
    values = [local.vpc_id]
  }
}

module "eks" {
  source                          = "terraform-aws-modules/eks/aws"
  version                         = "18.10.1"
  cluster_name                    = "test"
  cluster_version                 = "1.21"
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  vpc_id                          = local.vpc_id
  subnet_ids                      = data.aws_subnets.all.ids
  eks_managed_node_groups = {
    default = {
      medium = {
        instance_types = ["t3.medium"]
        desired_size   = 3
        max_size       = 10
        min_size       = 3
        disk_type      = "gp3"
      }
    }
  }
  enable_irsa = true
}

Running terraform apply will create 3 security groups, 2 of which are created by this module and 1 by EKS itself.

Screenshot from 2022-04-05 15-17-25

Screenshot from 2022-04-05 15-17-45

Both of these security groups have the tag kubernetes.io/cluster/test: owned.

blakepettersson avatar Apr 05 '22 13:04 blakepettersson

Modified from examples/complete removing all external modules and keeping 1 inline managed node group only.

nodegroup configuration


module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version = "~> 18.20.5"
 .
 .
 .

  # EKS Managed Node Group(s)

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    disk_size      = 10
    instance_types = ["t2.small", "t3.small", "t3a.small"]

    attach_cluster_primary_security_group = true
    vpc_security_group_ids                = [aws_security_group.additional.id]
  }

  eks_managed_node_groups = {
    green = {
      min_size     = 1
      max_size     = 10
      desired_size = 1

      instance_types = ["t3.small"]
      capacity_type  = "SPOT"
      labels = {
        Environment = "test"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }

      taints = {
        # dedicated = {
        #   key    = "dedicated"
        #   value  = "gpuGroup"
        #   effect = "NO_SCHEDULE"
        # }
      }

      update_config = {
        max_unavailable_percentage = 50 # or set `max_unavailable`
      }

      tags = {
        ExtraTag = "example"
      }
    }
  }

kubectl describe service

Events:
  Type     Reason                  Age                  From                Message
  ----     ------                  ----                 ----                -------
  Normal   EnsuringLoadBalancer    97s (x6 over 4m16s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  96s (x6 over 4m14s)  service-controller  Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-07d843b5953ca7412; ensure only the k8s security group is tagged; the tagged groups were sg-0bc8599e71f8a54a9(my-testing-k8s-node-20220419213507462100000009) sg-0b189fb8969ff5bd4(eks-cluster-sg-my-testing-k8s-987542792)

This workaround solves the problem. Pass it to the module "eks"

  # Temp workaround for bug : double owned tag 
  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1810
  node_security_group_tags = {
  "kubernetes.io/cluster/${local.name}" = null
  }

Notice that taint configuration is also disabled since it caused rejection in scheduler with core-dns deployment.

      taints = {
        # dedicated = {
        #   key    = "dedicated"
        #   value  = "gpuGroup"
        #   effect = "NO_SCHEDULE"
        # }
      }

leiarenee avatar Apr 23 '22 00:04 leiarenee

@bryantbiggs so it seems 🤦. The issue in the case of EKS managed node groups seems to be that EKS autogenerates a security group that already has the kubernetes.io/cluster/${var.cluster_name} set, which conflicts with the tag which is set by aws_security_group.node. I'll setup a proper repro for this.

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/dc8a6eecddc2c6957ba309a939ee46f39b946461/node_groups.tf#L160

gohmc avatar Apr 23 '22 10:04 gohmc

I'm also experiencing an issue with this after installing the aws-load-balancer-controller. I get the following error from there.

{"level":"error","ts":1651137191.6512694,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-echoserv-echoserv-aaf854977c","namespace":"echoserver","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/geeiq-test-k8s for eni eni-0605625aa92fe670e, got: [sg-0cc7f9db258d4e3f8 sg-0e95a58276d2c4ae5]"}

After checking the console. i can see i have 2 sgs with the same tag "kubernetes.io/cluster/<cluster_name>" = "owned".

Im also using eks managed node groups.

kaykhancheckpoint avatar Apr 28 '22 09:04 kaykhancheckpoint

@kaykhancheckpoint check the last couple posts for the workaround from leiarenee

diginc avatar Apr 28 '22 18:04 diginc

This workaround solved it for me. (https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1810) (Repeating for brevity)

Module 'eks'

  node_security_group_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

leiarenee avatar Apr 28 '22 20:04 leiarenee

@diginc @leiarenee yup, thank you

kaykhancheckpoint avatar Apr 28 '22 20:04 kaykhancheckpoint

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar May 30 '22 00:05 github-actions[bot]

This workaround did not work for us.

rcmurphy avatar Jun 03 '22 18:06 rcmurphy

@rcmurphy / @mr-kelly can you try:

  cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Where ${local.name} is the name of your cluster

bryantbiggs avatar Jun 03 '22 18:06 bryantbiggs

~~I can confirm we've tried that. It did not work.~~

rcmurphy avatar Jun 03 '22 18:06 rcmurphy

~I can confirm we've tried that. It did not work.~

apologies - does that mean that it does work for your case or not?

bryantbiggs avatar Jun 04 '22 00:06 bryantbiggs

It means my ops engineer told me he did it and then realized that you would actually posted a slightly different command, So we never actually tried it. We are used to different work around though (In this case we need to set a kernel parameter, so are using a start up container that’s privileged to do it instead.)

On Jun 3, 2022, at 20:22, Bryant Biggs @.***> wrote:

 I can confirm we've tried that. It did not work.

apologies - does that mean that it does work for your case or not?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

rcmurphy avatar Jun 04 '22 00:06 rcmurphy

@rcmurphy / @mr-kelly can you try:

  cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Where ${local.name} is the name of your cluster

In my Karpenter Example, this doesn't work.

image

And these stop create security group doesn't work so:

  create_node_security_group = false
  create_cluster_security_group = false

I am still trying....

mr-kelly avatar Jun 04 '22 01:06 mr-kelly

In this offical doc: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html

AWS says :

image

And in my karpenter example,
kubernetes.io/cluster/cluster-name=owned appears twice.

mr-kelly avatar Jun 04 '22 01:06 mr-kelly

After digging in to this, here is what I found:

If you use attach_cluster_primary_security_group = true, then you will want to set

node_security_group_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

Why - the primary security group created by the EKS service automatically tags the cluster's primary security group. I tried to use

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

To "hide" this tag and remove the conflict, but this route does not work. Its safe to say, if you want to attach the cluster's primary security group to your nodes, you need to disable the tag on the node shared security group.

Its not a matter of just having multiple security groups with the tag key:value combo, its only a concern when multiple security groups with the tag key:value combo are ATTACHED to your node(s).

You can refer to this example which did not encounter any issues provisioning the ALB controller chart https://github.com/clowdhaus/eks-reference-architecture/tree/main/karpenter/us-east-1

Let me know if there is anything else I missed or any other questions pertaining to this issue.

bryantbiggs avatar Jun 06 '22 17:06 bryantbiggs

If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with "kubernetes.io/cluster/$clustername" = owned. And that would mean you basically have to use attach_cluster_primary_security_group = true, because the clusters primary security group will always have the problematic tag, since you state that

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

does not work.

An alternative would be to set create_security_group = falsein the node group, which seems like the equivalent of setting attach_cluster_primary_security_group = true.

Do I understand this correctly?

axkng avatar Jun 21 '22 07:06 axkng

If I understand you correctly @bryantbiggs , EKS automatically tags the clusters primary security group with "kubernetes.io/cluster/$clustername" = owned. And that would mean you basically have to use attach_cluster_primary_security_group = true, because the clusters primary security group will always have the problematic tag, since you state that

cluster_tags = {
    "kubernetes.io/cluster/${local.name}" = null
  }

does not work.

An alternative would be to set create_security_group = falsein the node group, which seems like the equivalent of setting attach_cluster_primary_security_group = true.

Do I understand this correctly?

No - the security group and tag in question only come into play when they are attached to nodes. So you can have 40+ security groups in your account with this tag without issue, but as soon as you attach more than one of those security groups to node(s) in your cluster, then its a conflict

bryantbiggs avatar Jun 21 '22 11:06 bryantbiggs

That is very interesting. My clusters were built without attach_cluster_primary_security_group = true. Yet the aws-lb-controller complains as soon as I add the tag to any second security group.

axkng avatar Jun 21 '22 11:06 axkng

Hi, @kaykhancheckpoint how did you actually deploy and enabled the "load balancer controller"?

ashishjullia avatar Jul 05 '22 03:07 ashishjullia

Running into what appears to be a similar issue, I think I've been able to work around it by using create_cluster_primary_security_group_tags = false.

The problem appears to be that there are more than 1 security group that are tagged with "kubernetes.io/cluster/${var.cluster_name}" = "owned". Setting that variable to false seems to at least stop Karpenter nodes from picking up the additional security group which stops things like the aws-lb-controller from complaining about ENIs with multiple security groups.

This may be a clue when running an apply:

module.eks.aws_ec2_tag.cluster_primary_security_group["karpenter.sh/discovery"] will be destroyed

Note that existing Karpenter instances need to be replaced for this tag change to take effect.

tareks avatar Jul 06 '22 16:07 tareks

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar Aug 06 '22 00:08 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar Aug 16 '22 00:08 github-actions[bot]

I think this issue has probably been closed prematurely; I just hit it as well with this template.

robertlagrant avatar Aug 26 '22 14:08 robertlagrant

The issue still exists with module terraform-aws-modules/eks/aws:v18.26.6. The proposed workaround resolved my problem.

Salah-open avatar Aug 28 '22 23:08 Salah-open

+1

Stepped over this as well with 18.26.6, using attach_cluster_primary_security_group = true.

Results was the inability to have the ELB working, having two security groups with "kubernetes.io/cluster/<cluster_name>" = "owned".

Using create_cluster_primary_security_group_tags = false, but honestly atm i'm not able to say if this is a proper setup or just a workaround.

uwburn avatar Sep 01 '22 14:09 uwburn

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Nov 08 '22 02:11 github-actions[bot]