terraform-aws-eks After update to v20 with API_AND_CONFIG_MAP cluster cannot launch Fargate pods

Description

After updating cluster from v19 to v20 with switching to API_AND_CONFIG_MAP auth mode, cluster cannot launch new Fargate pods.

New clusters with API_AND_CONFIG_MAP mode cannot launch Fargate as well.

Versions

Module version [Required]: 20.2.0
Terraform version:

Terraform v1.5.7
on darwin_arm64

Provider version(s):

+ provider registry.terraform.io/hashicorp/aws v5.35.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.25.2
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5

Reproduction Code [Required]

Basic stripped down version of what we're using:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.2.0"

  cluster_name    = "test"
  cluster_version = "1.28"

  vpc_id = <vpc_id>
  subnet_ids = <subnet_ids>

  enable_irsa                     = true
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false
  iam_role_use_name_prefix        = true

  enable_cluster_creator_admin_permissions = true

  fargate_profiles = {
    kube-system = {
      name = "kube-system"
      selectors = [
        { namespace = "kube-system" }
      ]
    }
  }
}

Expected behavior

Fargate pods should be able to launch

Actual behavior

After update to v20 we're seeing following errors when trying to launch new pods:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  32m   fargate-scheduler  Misconfigured Fargate Profile: fargate profile kube-system blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.
  Warning  FailedScheduling  27m   fargate-scheduler  Misconfigured Fargate Profile: fargate profile kube-system blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.

Additional context

From what I see, there is access entry created for Fargate, but no aws-auth ConfigMap entry. While that's probably expected, maybe it affects the ability to run Fargate?

Feb 07 '24 14:02 dmitriishaburov

what steps did you follow when upgrading from v19 to v20?

Feb 07 '24 14:02 bryantbiggs

Hey, we saw this when upgrading our clusters to use the EKS API. We re-created all of our Karpenter Fargate Profiles and this solved this issue for us.

Might be worth a try as recreating the Fargate Profile will not cause a loss of nodes (only a short window of no Autoscaling).

Feb 08 '24 05:02 jeremyruffell

what steps did you follow when upgrading from v19 to v20?

@bryantbiggs doesn't really matters tbh, since issue reproduces on new cluster created from scratch with v20 module

Might be worth a try as recreating the Fargate Profile will not cause a loss of nodes (only a short window of no Autoscaling).

Tried this, after Fargate profile recreation entries were added by EKS itself to aws-auth configmap and Fargate started working. But the next apply shows diff to remove these entried from configmap:

  # module.cluster.module.auth.kubernetes_config_map_v1_data.aws_auth[0] will be updated in-place
  ~ resource "kubernetes_config_map_v1_data" "aws_auth" {
      ~ data          = {
          ~ "mapRoles"    = <<-EOT
              - - groups:
              -   - system:bootstrappers
              -   - system:nodes
              -   rolearn: arn:aws:iam::1111111111111:role/Karpenter-test
              -   username: system:node:{{EC2PrivateDNSName}}
              - - groups:
              -   - system:bootstrappers
              -   - system:nodes
              -   - system:node-proxier
              -   rolearn: arn:aws:iam::1111111111111:role/kube-system-20240208073339112200000002
              -   username: system:node:{{SessionName}}
              + - "groups":
              +   - "system:bootstrappers"
              +   - "system:nodes"
              +   "rolearn": "arn:aws:iam::1111111111111:role/Karpenter-test"
              +   "username": "system:node:{{EC2PrivateDNSName}}"
            EOT
            # (2 unchanged elements hidden)
        }

Fargate profile works so far (after removal entries from aws-auth configmap), but not sure if that's permanent solution.

Update: after some time after deletion entries from configmap, Fargate profile stopped working again:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  7s    fargate-scheduler  Misconfigured Fargate Profile: fargate profile kube-system blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.

Feb 08 '24 07:02 dmitriishaburov

@bryantbiggs doesn't really matters tbh, since issue reproduces on new cluster created from scratch with v20 module

@dmitriishaburov are you saying that creating a brand new cluster using the latest v20 module and EKS Fargate Profiles with authentication_mode = "API_AND_CONFIG_MAP", the pods running on Fargate nodes fail to launch or launch but then later fail?

Feb 08 '24 13:02 bryantbiggs

@dmitriishaburov are you saying that creating a brand new cluster using the latest v20 module and EKS Fargate Profiles with authentication_mode = "API_AND_CONFIG_MAP", the pods running on Fargate nodes fail to launch or launch but then later fail?

Yes, after creating a brand new cluster with v20 module and Fargate, pods initially launch, but later fail (existing pods keep running, but cannot launch any new pod)

Feb 08 '24 13:02 dmitriishaburov

Ok thank you - let me dig into this

Feb 08 '24 13:02 bryantbiggs

@dmitriishaburov do you have a way to reproduce? I launched the Fargate example that we have in this module and scaled the sample deployment and still not seeing any issues so far:

k get pods -A
NAMESPACE     NAME                         READY   STATUS    RESTARTS   AGE
default       inflate-75d744d4c6-67r5k     1/1     Running   0          11m
default       inflate-75d744d4c6-8mn7d     1/1     Running   0          11m
default       inflate-75d744d4c6-mlgq7     1/1     Running   0          11m
default       inflate-75d744d4c6-nf6lm     1/1     Running   0          11m
default       inflate-75d744d4c6-pd8rc     1/1     Running   0          11m
karpenter     karpenter-7b9d64546f-96jdn   1/1     Running   0          17m
karpenter     karpenter-7b9d64546f-dn6kg   1/1     Running   0          17m
kube-system   aws-node-qlv67               2/2     Running   0          11m
kube-system   coredns-644f96d56d-5lwzv     1/1     Running   0          22m
kube-system   coredns-644f96d56d-tw87p     1/1     Running   0          22m
kube-system   kube-proxy-tdmhd             1/1     Running   0          11m

Feb 08 '24 14:02 bryantbiggs

@bryantbiggs have you checked the aws-auth configmap that it doesn't have entries for fargate? If there's no configmap entries, I'd try to restart any deployment in ~1 hour or so, i.e. kubectl rollout restart deploy coredns -n kube-system

Feb 08 '24 14:02 dmitriishaburov

yes, there are configmap entries - these are created by EKS

k get configmap -n kube-system aws-auth -o yaml

apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      - system:node-proxier
      rolearn: arn:aws:iam::111111111111:role/kube-system-20240208133840563900000002
      username: system:node:{{SessionName}}
    - groups:
      - system:bootstrappers
      - system:nodes
      - system:node-proxier
      rolearn: arn:aws:iam::111111111111:role/karpenter-20240208133840563500000001
      username: system:node:{{SessionName}}
kind: ConfigMap
metadata:
  creationTimestamp: "2024-02-08T13:49:11Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "1442"
  uid: 990a01cc-c9cb-4e5a-a0b5-e278ebfdefce

Feb 08 '24 14:02 bryantbiggs

I've manually deleted the aws-auth ConfigMap and restarted both the coreDNS and Karpenter deployments and still no signs of auth issues

Feb 08 '24 15:02 bryantbiggs

still no signs of auth issues after an hour. for now I am going to park this, I don't think there is anything module related since I am unable to reproduce

Feb 08 '24 16:02 bryantbiggs

Yeah, seems like it's quite hard to replicate.

I've created one more cluster to replicate, keeping configuration as small as possible, and was trying to restart coredns. It took around 1,5 hours for Fargate profile to start failing: First try: Fri Feb 9 10:39:07 EET 2024 Failed to start: Fri Feb 9 11:59:38 EET 2024

Here's entire terraform code for the cluster:

rovider "aws" {
  profile             = "profile"
  region              = "eu-central-1"
  allowed_account_ids = ["111111111"]
}

data "aws_eks_cluster_auth" "this" {
  name = module.eks.cluster_name
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.this.token
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.2.0"

  cluster_name    = "fargate-test"
  cluster_version = "1.28"

  vpc_id = "vpc-111111111"
  subnet_ids = [
    "subnet-111111111",
    "subnet-111111112",
    "subnet-111111113",
  ]

  cluster_encryption_config                = {}
  cluster_endpoint_private_access          = true
  cluster_endpoint_public_access           = false
  enable_cluster_creator_admin_permissions = true

  cluster_security_group_additional_rules = {
    vpn_access = {
      description = "VPN"
      protocol    = "tcp"
      from_port   = 443
      to_port     = 443
      cidr_blocks = [
        "192.168.0.0/19",
      ]
      type = "ingress"
    }
  }

  fargate_profiles = {
    kube-system = {
      name = "kube-system"
      selectors = [
        { namespace = "kube-system" }
      ]
    }
  }
}

module "auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "20.2.0"

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      username = "SomeIAMRole"
      rolearn  = "arn:aws:iam::111111111:role/SomeIAMRole"
      groups   = ["system:masters"]
    }
  ]
}

Feb 09 '24 10:02 dmitriishaburov

@dmitriishaburov The fargate execution pods used to be added with the aws-auth template file in v19. In v20 it's not there anymore and requires you to pass it forward.

So moving from v19 -> v20 you need to add those to the roles mapping:

module "aws-auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "~> 20.0"

  # aws-auth configmap
  create_aws_auth_configmap = false
  manage_aws_auth_configmap = true
  aws_auth_roles            = concat(local.roles, local.nodegroup_roles)
  aws_auth_users            = concat(local.cluster_users, local.users, local.tf_user)
}

locals {
  roles = try([
    {
      rolearn  = module.eks_blueprints_addons[0].karpenter.node_iam_role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes"
      ]
    },
    {
      rolearn = module.eks[0].module.fargate_profile["karpenter"].fargate_profile_pod_execution_role_arn
      username = "system:node:{{SessionName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
        "system:node-proxier"
      ]
    }
  ], []
  cluster_users = try([
    for arn in var.cluster_users :
    {
      userarn  = arn
      username = regex("[a-zA-Z0-9-_]+$", arn)
      groups = [
        "system:masters"
      ]
    }
  ], [])
}

@bryantbiggs I see you posted the aws-auth configmap that was recreated after being deleted, could you paste the TF code you are using to create that using the aws-auth module? I'm assuming you are passing it the fargate_profiles similar to what I am but probably doing it by for_each (which would be better). Probably need way to document this change a bit more. The old aws-auth template besides fargate_profiles also was doing similar for nodegroups as well so that would be more to add here and should be documented somewhere of what would needed to be added when moving from v19 -> v20 to mimic that exact same configMap as the old template was doing automatically.

Feb 09 '24 16:02 cdenneen

The fargate execution pods used to be added with the aws-auth template file in v19. In v20 it's not there anymore and requires you to pass it forward.

This is not true - EKS will create both the aws-auth ConfigMap entry and cluster access entry when using authentication_mode = "API_AND_CONFIG_MAP". However, if you are making any changes to the aws-auth ConfigMap, its up to you to ensure any entries you require stay in the ConfigMap via the configuration you use. With "API_AND_CONFIG_MAP", you do not need to have the Fargate profile's IAM roles added in the ConfigMap because EKS will ensure you have an access entry and this is controlled outside of Terraform.

Feb 09 '24 16:02 bryantbiggs

Probably need way to document this change a bit more. The old aws-auth template besides fargate_profiles also was doing similar for nodegroups as well so that would be more to add here and should be documented somewhere of what would needed to be added when moving from v19 -> v20 to mimic that exact same configMap as the old template was doing automatically.

It is documented, I created an entire replica of the module to make this transition easier https://github.com/clowdhaus/terraform-aws-eks-migrate-v19-to-v20

Unless users are using authentication_mode = "CONFIG_MAP", there are no actions users need to take with EKS Fargate profiles and managed nodegroups (once they have migrated to v20, or if they are provisioning new clusters with v20)

Feb 09 '24 16:02 bryantbiggs

you do not need to have the Fargate profile's IAM roles added in the ConfigMap because EKS

Docs are not entirely clear, but it seems like during migration to access entries you shouldn't actually remove Fargate (or managed node group) entries from ConfigMap

https://docs.aws.amazon.com/eks/latest/userguide/migrating-access-entries.html

In v19 configmap entries were created automatically in terraform, in v20 any change to ConfigMap via terraform removes the AWS-created entries from ConfigMap. Probably it would make sense keep behavior same in aws-auth module.

If you remove entries that Amazon EKS created in the ConfigMap, your cluster won't function properly.

Feb 09 '24 21:02 dmitriishaburov

we cannot maintain the same functionality because that means we are keeping the Kubernetes provider in the module which we are absolutely not doing

In v19 configmap entries were created automatically in terraform, in v20 any change to ConfigMap via terraform removes the AWS-created entries from ConfigMap. Probably it would make sense keep behavior same in aws-auth module.

This is not true. You need to understand how EKS handles access, as I've stated above.

Prior to access entry, when you create a managed nodegroup or Fargate profile, EKS will automatically upsert an entry into the aws-auth ConfigMap with the IAM role that is used by either compute construct.
With access entry, when you create a managed nodegroup or Fargate profile, if you set authentication_mode = "API_AND_CONFIG_MAP", EKS will again automatically upsert an entry into the aws-auth ConfigMap with the IAM role that is used by either compute construct AND create an access entry for the IAM role
If using access entry and authentication_mode = "API", EKS will automatically create an access entry for the IAM role for managed nodegroup(s) and Fargate profile(s)

Thats just the EKS portion, thats the behavior of the EKS API, both past and present.

In terms of this module, the aws-auth ConfigMap was a bit contentious because Terraform does not like sharing ownership (actually, it doesn't share at all) of resources. Ignoring this module, if you defined a kubernetes_config_map resource, it was very easy for users to overwrite the contents that already existed in the configmap (i.e. - the entries that EKS added for managed nodegroups and Fargate profiles), only after dealing with the conflict resource already exists error. This was so problematic that Hashicorp created a resource that is somewhat abnormal and outside the normal Terraform philosophy, kubernetes_config_map_v1_data, to allow users to forcefully overwrite a configmap, allowing users to avoid the "resource already exists" errors, but again you still had the issue of wiping the entries that EKS added.

Coming back to this module, we automatically mapped the roles from both managed nodegroups and Fargate profiles created by this module into aws-auth ConfigMap` entries to ensure users didn't shoot their own foot and remove the entries that EKS added. To users, this was transparent - it seemed like the module was the creator of these entries, but as you can see - its a bit more nuanced.

Finally, we come to the migration from aws-auth ConfigMap to cluster access entry. We will use the following scenario to better highlight the various components, steps, and interactions:

Starting with an EKS module using v19 which has manage_aws_auth_configmap = true
There is a self-managed nodegroup, EKS managed nodegroup, and Fargate profile in this cluster definintion
There is an extra entry in the aws-auth configmap for an IAM role or user

First, we will use the steps in the upgrade guide and start off by changing the source of the module from source = "terraform-aws-modules/eks/aws" to source = "[email protected]:clowdhaus/terraform-aws-eks-v20-migrate.git?ref=c356ac8ec211604defaaaad49d27863d1e8a1391" (remove the version for now since we are using a specific git SHA for this temporary step). This temporary module used to aid in upgrading will allow us to enable cluster access entry without modifying the aws-auth ConfigMap`
Once we've changed the source we'll do the usual Terraform commands:

terraform init -upgrade
terraform plan - check that everything looks kosher, we should see the authentication_mode = "API_AND_CONFIG_MAP" - consult the upgrade guide for any other changes that show up in the diff and make changes accordingly (should be quite minimal, only defaults for v20 that are changing or new additions)
terraform apply - accept the apply

What is happening in step 2 is that we are enabling cluster access entry but not modifying the aws-auth ConfigMapas stated in the EKS docs. If you do not specify any additionalaccess_entries`, this will only cover the self-managed nodegroup, the EKS managed nodegroup and Fargate profile IAM roles and the cluster creator (admin) role. In the background, EKS is creating the access entries for the managed nodegroup and Fargate profile roles, as well as the cluster creator (admin) role. The EKS module is creating the access entry for the self-managed nodegroup.

The last component we need to cover is the additional aws-auth ConfigMap entry for the IAM role or user. If you require custom RBAC permissions, you will need to continue using the ConfigMap route by using the new aws-auth sub-module. This sub-module is a direct copy of the v19 implementation, but it no longer has any default entries for nodegroups or Fargate profiles - only what users specify. If you can use one of the existing policies, you can instead create an access entry for this IAM role or user and completely remove the use of the aws-auth ConfigMap. For this scenario, we will only use access entries.
Change the module source back to source = "terraform-aws-modules/eks/aws" and set the appropriate v20 version and re-run the same set of commands listed in step 2. When this change is applied, the aws-auth ConfigMap would be deleted from the cluster due to Terraform. This is fine and expected, this is why we need to ensure access entries exist prior to this happening (or even prior to entries being removed from the ConfigMap).

Just for sake of completeness - if the authentication_mode stays at "API_AND_CONFIG_MAP" (which is fine), any changes to IAM roles for the managed nodegroup(s) or Fargate profile(s) in the cluster (updates, additions, etc.) - EKS will continue to automatically upsert entries into the aws-auth ConfigMap. In the scenario above, you saw that the ConfigMap was entirely removed from the cluster - but any of the described changes will cause the aws-auth ConfigMapto be re-created by EKS. If you want to avoid this entirely, you can change theauthentication_modeto"API"and only access entries will be used, and EKS will no longer make any modifications to theaws-auth` ConfigMap

Feb 09 '24 22:02 bryantbiggs

I'm running into the same issues and reading all of this doesn't clear up what is happening.

I've run the migration from v19 to v20 using the migration fork and the fargate pods are starting correctly. However now that I'm back on the standard eks module source with the version set to ~>20.0 a terraform plan is saying it would like to destroy the module.eks.kubernetes_config_map_v1_data.aws_auth[0] resource. I did see in the 20.x upgrade documentation that instead of letting terraform destroy that resource it should be removed from the state so no disruptions occur. In my case it wants to remove the karpenter and the initial-eks-node-group roles. The 20.x upgrade documentation says it automatically adds access for managed node groups and fargate, which karpenter is running in Fargate. If that statement is true why do we need to leave the resources that were created by module.eks.kubernetes_config_map_v1_data.aws_auth[0] around?

Any help clarifying this would greatly appreciated.

Feb 10 '24 05:02 aaron-ballard-530

If that statement is true why do we need to leave the resources that were created by module.eks.kubernetes_config_map_v1_data.aws_auth[0] around?

You only need to move/remove the aws-auth resources when you are going to have entries that are used in the aws-auth configmap. If everything is covered by cluster access entries, you do not need to do anything with these resources and simply let Terraform destroy them

Feb 11 '24 13:02 bryantbiggs

I faced the same challenge as well. I identified the issue. as per AWS doc we need to have the following policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Condition": {
         "ArnLike": {
            "aws:SourceArn": "arn:aws:eks:region-code:111122223333:fargateprofile/my-cluster/*"
         }
      },
      "Principal": {
        "Service": "eks-fargate-pods.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

instead of the module is creating only

{
  "Version": "2012-10-17",
  "Statement": [
      "Principal": {
        "Service": "eks-fargate-pods.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

once i added the additional trust things started working. I am not sure why it is working on v19 though

Feb 14 '24 16:02 kuntalkumarbasu

Hey @bryantbiggs , similar thing i'm observing and would like to have a clarification before i make the upgrade to avoid any disruption in acess.

So when i'm going for v19 to v20 setting the version to ~>20.0 and having the the auth_roles creation using the sub module as below

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.2.0"
....
}

module "auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "20.2.0"

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      username = "SomeIAMRole"
      rolearn  = "arn:aws:iam::111111111:role/SomeIAMRole"
      groups   = ["system:masters"]
    }
  ]
}

the terraform plan says destruction & creation of the config map

 # module.eks.kubernetes_config_map_v1_data.aws_auth[0] will be destroyed
  # (because kubernetes_config_map_v1_data.aws_auth is not in configuration)

 # module.eks-auth-modules.kubernetes_config_map_v1_data.aws_auth[0] will be created

My concern with this delete & create, will i lose the access for sometime? or EKS access entry (which i'm assuming will automatically created) will take care for the disruption? Can this create shooting on the foot scenario?

And is it for this reason we need go with this approach https://github.com/clowdhaus/terraform-aws-eks-migrate-v19-to-v20 and no direct upgrade from v19 to v20?

Feb 19 '24 14:02 pribanerjee

This is tough to reproduce, but I ran into it as well, in API_AND_CONFIG_MAP mode. While there is an access entry being created for Fargate profiles, it appears to be missing something, not sure what. I had to re-add the entries to the aws-auth ConfigMap to keep my Fargate profiles working.

Feb 27 '24 15:02 jasoncuriano

I also ran into the same issue and same error message

Steps I followed:

eks module version 19.17.1
Upgrading to 20.4.0
I simply upgraded the version. In my terraform plan i also got the same module.eks.kubernetes_config_map_v1_data.aws_auth[0] will be destroyed and after the upgrade immediately rollout command to deploy pod on fargate was working but after an hour it didn’t.
Note: I didnt specify authentication_mode = "API_AND_CONFIG_MAP". I simply chose to keep the default

Mar 01 '24 10:03 jatinmehrotra

that is far from what is outlined in the upgrade guide and I would expected issues when following that route

Mar 01 '24 12:03 bryantbiggs

i just brought up a brand new EKS cluster on 20.5.0 and its having the same issue:

  Warning  FailedScheduling  21s   fargate-scheduler  Misconfigured Fargate Profile: fargate profile coredns blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods.

so this has nothing to do with the upgrade

module "eks" {
...
  enable_cluster_creator_admin_permissions = true
  fargate_profile_defaults = {
    iam_role_additional_policies = {
      additional = aws_iam_policy.node_additional.arn,
    }
    tags = {
      cluster = local.name
    }
    timeouts = {
      create = "20m"
      delete = "20m"
    }
  }

  fargate_profiles = {
    karpenter = {
      selectors = [
        { namespace = "platform-karpenter" }
      ]
    }
    coredns = {
      selectors = [
        { namespace = "kube-system", labels = { k8s-app = "kube-dns" } }
      ]
    }
  }
}
# bit more indepth policy than the one in the fargate example:
resource "aws_iam_policy" "node_additional" {
  name        = "${local.name}-additional"
  description = "Example usage of node additional policy"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "ec2:Describe*",
        ]
        Effect   = "Allow"
        Resource = "*"
      },
      {
        Action = [
          "kms:Decrypt",
        ]
        Effect = "Allow"
        Resource = [
          var.session_manager_key
        ]
      },
      {
        Action = [
          "kms:*"
        ]
        Effect   = "Allow"
        Resource = ["*"]
        Condition = {
          StringLike = {
            "ec2:ResourceTag/Terraform" = "true"
          }
        }
      }
    ]
  })

  tags = local.tags
}

looked at the pod execution role for coredns and it has "AmazonEKS_CNI_Policy", "AmazonEKSFargatePodExecutionRolePolicy", "My additional role policy" with a trust of:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "eks-fargate-pods.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

So it looks correct I think. I know above @kuntalkumarbasu said you need to add the condition to make it work but that doesn't seem correct does it?

module "aws-auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "~> 20.0"

  # aws-auth configmap
  create_aws_auth_configmap = false
  manage_aws_auth_configmap = true
  # local.cluster_users is list of arns, local.users is AWS account list of arns, local.tf_user is the role arn creating the terraform apply to add as system:masters - these are to be replaced with access_entries (tf user should already be done by the eks module now).  
  aws_auth_users            = concat(local.cluster_users, local.users, local.tf_user)
}

I used to add the fargate executor arns here before as described in previous reply but since module creates access entry I removed those from being added here as aws_auth_roles

Mar 06 '24 03:03 cdenneen

As @bryantbiggs mentioned. I followed the following steps and I am not seeing this error anymore.

https://github.com/clowdhaus/terraform-aws-eks-migrate-v19-to-v20
- Update the module to 19.21
- changed the source
- refactored the code
- terraform init -upgrade and apply
- changed the source again
- terraform init -upgrade and apply
changed the authentication_mode = API
terraform apply

confirmed using kubectl rollout in 2 hours, 24 hours. EKS is able to deploy pods on fargate nodes.

Mar 06 '24 04:03 jatinmehrotra

Team, I'm facing the same issue post migrating from v19 to v20. I get the following error as OP on my Karpenter Pending pods

Misconfigured Fargate Profile: fargate profile karpenter blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods

What are some things I could try?

eks-cluster

module "eks_cluster" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.8.2"

  create_kms_key = true
  kms_key_owners = [
    "arn:aws:iam::${local.account_id}:root",
  ]

  kms_key_administrators = [
    "arn:aws:iam::${local.account_id}:role/aws-reserved/sso.amazonaws.com/<role>
  ]

  cluster_enabled_log_types = [
    "api",
    "authenticator"
  ]

  enable_irsa         = true
  authentication_mode = "API_AND_CONFIG_MAP"

  cluster_name                   = local.name
  cluster_version                = "1.29"
  cluster_endpoint_public_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  # control_plane_subnet_ids = module.vpc.intra_subnets

  # Fargate profiles use the cluster primary security group so these are not utilized
  create_cluster_security_group = false
  create_node_security_group    = false

  fargate_profiles = {
    karpenter = {
      selectors = [
        { namespace = "karpenter" }
      ]
    }
  }

  tags = merge(local.tags, {
    "karpenter.sh/discovery" = local.name
  })

  node_security_group_additional_rules = {
    ingress_self_all = { ... }
    egress_all = { ... }
    ingress_cluster_to_node_all_traffic = { ... }
  }
}

aws-auth

module "eks_cluster_aws_auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "~> 20.8.2"

  manage_aws_auth_configmap = true
  aws_auth_roles = flatten([
    # We need to add in the Karpenter node IAM role for nodes launched by Karpenter
    {
      rolearn  = module.eks_blueprints_addons.karpenter.node_iam_role_arn
      username = "system:node:{{SessionName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
        "system:node-proxier"
      ]
    },
    module.platform.aws_auth_configmap_role,
    module.peo.aws_auth_configmap_role,
    module.ats.aws_auth_configmap_role,
    module.hris_relay.aws_auth_configmap_role,
    module.pipelines.aws_auth_configmap_role,
  ])
}

karpenter

module "eks_cluster_karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.8.2"

  cluster_name = module.eks_cluster.cluster_name

  create_access_entry = false

  enable_irsa             = true
  create_instance_profile = true

  iam_role_name          = "KarpenterIRSA-${module.eks_cluster.cluster_name}"
  iam_role_description   = "Karpenter IAM role for service account"
  iam_policy_name        = "KarpenterIRSA-${module.eks_cluster.cluster_name}"
  iam_policy_description = "Karpenter IAM role for service account"
  irsa_oidc_provider_arn = module.eks_cluster.oidc_provider_arn

  tags = merge(local.tags, {})

}

Mar 11 '24 19:03 invalidred

I had the same issue:

Misconfigured Fargate Profile: fargate profile karpenter blocked for new launches due to: Pod execution role is not found in auth config or does not have all required permissions for launching fargate pods

The only thing that solved for me was manually adding the fargate pod execution role to aws-auth (using the new submodule) like this:


module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.2.1"

  cluster_name                   = "my-cluster"
  cluster_version                = "1.28"

  # Fargate profiles use the cluster primary security group so these are not utilized
  create_cluster_security_group = false
  create_node_security_group    = false

  fargate_profiles = {
    karpenter = {
      selectors = [
        { namespace = "karpenter" }
      ]
    }
    kube-system = {
      selectors = [
        { namespace = "kube-system" }
      ]
    }
  }
}

module "eks_auth" {
  source  = "terraform-aws-modules/eks/aws//modules/aws-auth"
  version = "~> 20.2.1"

  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = module.karpenter.node_iam_role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
      ]
    },
    {
      rolearn  = module.eks.fargate_profiles.kube-system.fargate_profile_pod_execution_role_arn
      username = "system:node:{{SessionName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
        "system:node-proxier",
      ]
    },
    {
      rolearn  = module.eks.fargate_profiles.karpenter.fargate_profile_pod_execution_role_arn
      username = "system:node:{{EC2PrivateDNSName}}"
      groups = [
        "system:bootstrappers",
        "system:nodes",
        "system:node-proxier",
      ]
    },
  ]
}

For some reason, it only worked by using EC2PrivateDNSName as username, and making sure to also add system:node-proxier group, though I don't fully understand why.

Mar 13 '24 01:03 AlissonRS

for those on this issue/thread, can you open an AWS support case with your cluster ARN and the time period when you encountered this behavior, please

Mar 15 '24 19:03 bryantbiggs

We have encountered this issue on all of our ~12 clusters. It is definitely an EKS issue and not a terraform issue since deleting and recreating the fargate profile (either via terraform or the console) fixes it... temporarily. We've opened an AWS ticket for the matter.

Mar 16 '24 03:03 bnu0

terraform-aws-eks terraform-aws-eks copied to clipboard

After update to v20 with API_AND_CONFIG_MAP cluster cannot launch Fargate pods

Description

Versions

Reproduction Code [Required]

Expected behavior

Actual behavior

Additional context

terraform-aws-eks
terraform-aws-eks copied to clipboard