terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Creating an access entry fails if it already exists

Open deshruch opened this issue 11 months ago β€’ 21 comments

Description

I am trying to create a new access entry. I am migrating from 19.20 -> 20.5.0 and so getting rid of config map entry and migrating to access entry: Creation of access entry fails if it already exists. I have to go manually delete the role so it attempts to create it again. See Actual Behaviour for a full error message Also for user defined roles such as the 'cluster_management_role' as shown in the terraform code - it sometimes fails to attach the policy. This results in failed deployment for us since we are using this role to for EKSTokenAuth.

  • [ yes] βœ‹ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Versions

  • Module version [Required]: 20.5.0

  • Terraform version: 1.5.7

  • Provider version(s): 5.38.0

Reproduction Code [Required]

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.5.0"

  cluster_name                         = var.eks_cluster_name
  cluster_version                      = var.eks_version
  cluster_endpoint_public_access       = true
  cluster_endpoint_private_access      = true
  cluster_endpoint_public_access_cidrs = var.public_access_cidrs
  enable_irsa                          = true
  iam_role_arn                         = aws_iam_role.eks_cluster_role.arn
  authentication_mode                  = "API_AND_CONFIG_MAP"
  vpc_id                               = local.vpc_id
  control_plane_subnet_ids             = local.eks_cluster_private_subnets
  subnet_ids                           = local.eks_worker_private_subnets

  cluster_security_group_tags = {
    "kubernetes.io/cluster/${var.eks_cluster_name}" = null
  }

  cluster_addons = {
    vpc-cni = {
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      before_compute              = true
      service_account_role_arn    = module.vpc_cni_irsa.iam_role_arn
      addon_version               = local.eks_managed_add_on_versions.vpc_cni
      configuration_values = jsonencode({
        env = {
          # AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          # ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    coredns = {
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      preserve                    = true #this is the default value
      addon_version               = local.eks_managed_add_on_versions.coredns

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version               = local.eks_managed_add_on_versions.kube_proxy
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
    }
    aws-ebs-csi-driver = {
      addon_version               = local.eks_managed_add_on_versions.aws_ebs_csi_driver
      resolve_conflicts_on_update = "OVERWRITE"
      resolve_conflicts_on_create = "OVERWRITE"
      service_account_role_arn    = aws_iam_role.ebs_csi_role.arn

    }
  }

  enable_cluster_creator_admin_permissions = true
  access_entries = {
    cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

    mwaa = {
      kubernetes_groups = []
      principal_arn     = aws_iam_role.mwaa_execution_role.arn
      username          = "mwaa-service"
    }


  }


  node_security_group_additional_rules = {
    nodes_istiod_port = {
      description                   = "Cluster API to Node group for istiod webhook"
      protocol                      = "tcp"
      from_port                     = 15017
      to_port                       = 15017
      type                          = "ingress"
      source_cluster_security_group = true
    }
    node_to_node_communication = {
      description = "Allow full access for cross-node communication"
      protocol    = "tcp"
      from_port   = 0
      to_port     = 65535
      type        = "ingress"
      self        = true
    }
  }

  node_security_group_tags = {
    # NOTE - if creating multiple security groups with this module, only tag the
    # security group that Karpenter should utilize with the following tag
    # (i.e. - at most, only one security group should have this tag in your account)
    "karpenter.sh/discovery" = var.eks_cluster_name
  }

  eks_managed_node_group_defaults = {
    # We are using the IRSA created below for permissions
    # However, we have to provision a new cluster with the policy attached FIRST
    # before we can disable. Without this initial policy,
    # the VPC CNI fails to assign IPs and nodes cannot join the new cluster
    iam_role_attach_cni_policy = true
  }

  eks_managed_node_groups = {

    default = {
      name = "${var.eks_cluster_name}-default"

      subnet_ids = local.eks_worker_private_subnets

      min_size     = 2
      max_size     = 3
      desired_size = 2

      force_update_version = true
      instance_types       = ["m5a.xlarge"]

      # Not required nor used - avoid tagging two security groups with same tag as well
      create_security_group = false

      update_config = {
        max_unavailable_percentage = 50 # or set `max_unavailable`
      }

      description = "${var.eks_cluster_name} - EKS managed node group launch template"

      ebs_optimized           = true
      disable_api_termination = false
      enable_monitoring       = true

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 75
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 150
            encrypted             = true
            delete_on_termination = true
          }
        }
      }

      metadata_options = {
        http_endpoint               = "enabled"
        http_tokens                 = "required"
        http_put_response_hop_limit = 2
        instance_metadata_tags      = "disabled"
      }

      create_iam_role = false
      iam_role_arn    = aws_iam_role.eks_node_group_role.arn
      # iam_role_name            = "${var.eks_cluster_name}-default-managed-node-group"
      # iam_role_use_name_prefix = false
      # iam_role_description     = "EKS managed node group role"
      # iam_role_additional_policies = {
      #   AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      #   additional                         = aws_iam_policy.node_additional.arn
      # }

      tags = {
        EksClusterName = var.eks_cluster_name
      }
    }
  }

  tags = {
    # Explicit `nonsensitive()` call needed here as these tags are used in a foreach loop during deployment and foreach don't allow sensitive value
    nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_key.value) = nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_value.value)
    VPC_Name                                                                   = var.vpc_name
    Terraform                                                                  = "true"
  }
}

Steps to reproduce the behavior: terraform init terraform apply

Expected behavior

The cluster entry should be properly created even if it already exists. Policy should be attached correctly

Actual behavior

The behaviour is very intermittent and unpredictable. It sometimes creates tge We see error messages such as:

β•·
β”‚ Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_2dfe39b46fb1ea3a): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 06e2b43a-e5a6-46f6-a05f-ed8b0887aa75, ResourceInUseException: The specified access entry resource is already in use on this cluster.
β”‚
β”‚   with module.eks.aws_eks_access_entry.this["cluster_creator"],
β”‚   on .terraform/modules/eks/main.tf line 185, in resource "aws_eks_access_entry" "this":
β”‚  185: resource "aws_eks_access_entry" "this" {
β”‚
β•΅
β•·
β”‚ Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/second-us-east-1-eks-node-group-role): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 7f43c24f-361e-46cc-84e9-fe642dc622e0, ResourceInUseException: The specified access entry resource is already in use on this cluster.
β”‚
β”‚   with module.karpenter.aws_eks_access_entry.node[0],
β”‚   on .terraform/modules/karpenter/modules/karpenter/main.tf line 589, in resource "aws_eks_access_entry" "node":
β”‚  589: resource "aws_eks_access_entry" "node" {
β”‚
β•΅
make: *** [Makefile:142: deploy-eks-cluster] Error 1

Actual behaviour when cluster_management_role custom role access entry fails to attach the policy Plan: 18 to add, 2 to change, 13 to destroy.

β•·
β”‚ Error: query: failed to query with labels: secrets is forbidden: User "arn:aws:sts::473699735501:assumed-role/eks-second-us-east-1-cluster-management-role/EKSGetTokenAuth" cannot list resource "secrets" in API group "" in the namespace "karpenter"
β”‚
β”‚   with helm_release.karpenter,
β”‚   on eks-add-ons.tf line 101, in resource "helm_release" "karpenter":
β”‚  101: resource "helm_release" "karpenter" {
β”‚
β•΅

Terminal Output Screenshot(s)

Additional context

deshruch avatar Mar 11 '24 17:03 deshruch

@cweiblen Are you able to reproduce this issue as well?

deshruch avatar Mar 11 '24 20:03 deshruch

if you are migrating a cluster into cluster access entry, you can't use enable_cluster_creator_admin_permissions = true because EKS automatically maps that entity into an access entry. you can either remove this, or you can enable it but you'll need to import the entry that EKS created into the resource used by the module (to control this via Terraform)

bryantbiggs avatar Mar 11 '24 21:03 bryantbiggs

bryantbiggs For the other issue where the policy is never attached:

cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

I see this in the plan:


# module.eks.aws_eks_access_entry.this["cluster_manager"] will be created
  + resource "aws_eks_access_entry" "this" {
      + access_entry_arn  = (known after apply)
      + cluster_name      = "osdu5"
      + created_at        = (known after apply)
      + id                = (known after apply)
      + kubernetes_groups = (known after apply)
      + modified_at       = (known after apply)
      + principal_arn     = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"
      + tags              = {
          + "Terraform"            = "true"
          + "VPC_Name"             = "osdu5"
        }
      + tags_all          = {
          + "Terraform"            = "true"
          + "VPC_Name"             = "osdu5"
        }
      + type              = "STANDARD"
      + user_name         = (known after apply)
    }


# module.eks.aws_eks_access_policy_association.this["cluster_manager_cluster_manager"] will be created
  + resource "aws_eks_access_policy_association" "this" {
      + associated_at = (known after apply)
      + cluster_name  = "osdu5"
      + id            = (known after apply)
      + modified_at   = (known after apply)
      + policy_arn    = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
      + principal_arn = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"

      + access_scope {
          + type = "cluster"
        }
    }

Is this related with https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2958? If yes, what change do I need to make in my terraform code?

deshruch avatar Mar 11 '24 21:03 deshruch

I don't follow, what is the issue?

bryantbiggs avatar Mar 11 '24 21:03 bryantbiggs

In the reproduction code, see my access entry for principal_arn = aws_iam_role.cluster_management_role.arn After terraform is applied, the access entry is created but it does not have the AmazonEKSClusterAdminPolicy attached to it.

deshruch avatar Mar 11 '24 21:03 deshruch

See the second entry here:

image

deshruch avatar Mar 11 '24 21:03 deshruch

  1. What does the API say aws eks list-associated-access-policies --cluster-name <value> --principal-arn <value>
  2. Is your Terraform plan "clean" (i.e. - if you run terraform plan, its free of any diff/pending changes)

bryantbiggs avatar Mar 11 '24 21:03 bryantbiggs

Migrating an existing cluster from 19.20 -> 20.2, I was not able to get it working using access_entries input, I would get the errors described. As a workaround I used the aws_eks_access_entry from the AWS provider

cweiblen avatar Mar 12 '24 10:03 cweiblen

I would be curious to see what you are doing differently. If an access entry already exists, it already exists - there isn't anything unique about the implementation that would allow you to get around that

bryantbiggs avatar Mar 12 '24 10:03 bryantbiggs

@bryantbiggs There are 2 issues that we see. when using access_entries 1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning 2/ If it does create the entry, it does not attach the policy.

I plan to attempt the same thing as @cweiblen mentioned. Move it out of eks module and add a separate access entry 'resource'

deshruch avatar Mar 12 '24 14:03 deshruch

1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning

We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module

2/ If it does create the entry, it does not attach the policy.

Do you have a reproduction? I'd love to see whats different about a standalone resource versus whats defined here. Here is what we have in our example that works as intended https://github.com/terraform-aws-modules/terraform-aws-eks/blob/907f70cffdd03e14d1da97d916451cfb0688a760/examples/eks_managed_node_group/main.tf#L304-L342

bryantbiggs avatar Mar 12 '24 14:03 bryantbiggs

@bryantbiggs In my code I have an access entry of type 'cluster' as shown below:

In your example, ex-two is of type cluster but no 'policy_associations' section only a policy_arn. Is that may be the problem with my code?

access_entries = {
    cluster_manager = {
      kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
      principal_arn     = aws_iam_role.cluster_management_role.arn
      policy_associations = {
        cluster_manager = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type = "cluster"
          }

        }

      }
    }

    mwaa = {
      kubernetes_groups = []
      principal_arn     = aws_iam_role.mwaa_execution_role.arn
      username          = "mwaa-service"
    }

Can you post an example of ex-single of type cluster with a policy association/policy_arn? Probably the syntaxes are wrong?

deshruch avatar Mar 12 '24 14:03 deshruch

Reg:

 1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning

We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module

This is a problem when we are doing an upgrade. The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again. It should simply ignore if it already exists. But as you are saying its the EKS API and we need to log an issue there.

deshruch avatar Mar 12 '24 14:03 deshruch

The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again

From the details you have provided, its very hard to understand what you are doing and why you are encountering issues. I would suggest re-reading the upgrade guide. In short, there are two areas where access entries will already exist that YOU do not need to re-add them in code. Both of these scenarios are when you have a cluster that was created with the aws-auth ConfigMap and you are migrating to access entry:

  1. The identity that was used to create the cluster will automatically be mapped into an access entry when access entry is enabled on a cluster. Under the aws-auth ConfigMap only method, you would not see this identity in the ConfigMap. If you are using the same role that was used to create the cluster using aws-auth and you are migrating to access entry, you should not set enable_cluster_creator_admin_permissions = true because Terraform will try to create an access entry that EKS has already created and it will fail. If you wish to control this in code you will either need to manually delete the entry via the EKS API and then create with Terraform, or do a Terraform import to control this through code. We cannot do anything about this in the module since the module did not create it in the first place
  2. EKS will automatically create access entries for roles used by EKS managed nodegroup(s) and EKS Fargate profiles - users should NOT do anything with these cluster access entries when migrating to cluster access entry - leave these to EKS to manage. Again, if you try to re-add these entries through code/Terraform, it will fail and state that an entry already exists

bryantbiggs avatar Mar 12 '24 15:03 bryantbiggs

and for sake of completeness, here is an example as requested of a single entry with cluster scope as the module is currently written - it works without issue:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.8"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = true

  enable_cluster_creator_admin_permissions = true

  vpc_id                   = module.vpc.vpc_id
  subnet_ids               = module.vpc.private_subnets
  control_plane_subnet_ids = module.vpc.intra_subnets

  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    instance_types = ["m6i.large", "m5.large", "m5n.large", "m5zn.large"]
  }

  eks_managed_node_groups = {
    # Default node group - as provided by AWS EKS
    default_node_group = {
      # By default, the module creates a launch template to ensure tags are propagated to instances, etc.,
      # so we need to disable it to use the default template provided by the AWS EKS managed node group service
      use_custom_launch_template = false
    }
  }

  access_entries = {
    # One access entry with a policy associated
    ex-single = {
      principal_arn     = aws_iam_role.this["single"].arn
      policy_associations = {
        ex = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
          access_scope = {
            type = "cluster"
          }
        }
      }
    }
  }

  tags = local.tags
}

Describe the access entry:

aws eks describe-access-entry \
  --cluster-name ex-eks-managed-node-group \
  --principal-arn "arn:aws:iam::000000000000:role/ex-single" \
  --region eu-west-1
{
    "accessEntry": {
        "clusterName": "ex-eks-managed-node-group",
        "principalArn": "arn:aws:iam::000000000000:role/ex-single",
        "kubernetesGroups": [],
        "accessEntryArn": "arn:aws:eks:eu-west-1:000000000000:access-entry/ex-eks-managed-node-group/role/000000000000/ex-single/40c71997-3891-aa1c-0997-e0352c7ca25a",
        "createdAt": "2024-03-12T11:01:05.685000-04:00",
        "modifiedAt": "2024-03-12T11:01:05.685000-04:00",
        "tags": {
            "GithubRepo": "terraform-aws-eks",
            "GithubOrg": "terraform-aws-modules",
            "Example": "ex-eks-managed-node-group"
        },
        "username": "arn:aws:sts::000000000000:assumed-role/ex-single/{{SessionName}}",
        "type": "STANDARD"
    }
}

List the policies associated with this principal:

aws eks list-associated-access-policies \
  --cluster-name ex-eks-managed-node-group \
  --principal-arn "arn:aws:iam::000000000000:role/ex-single" \
  --region eu-west-1
{
    "associatedAccessPolicies": [
        {
            "policyArn": "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy",
            "accessScope": {
                "type": "cluster",
                "namespaces": []
            },
            "associatedAt": "2024-03-12T11:01:07.063000-04:00",
            "modifiedAt": "2024-03-12T11:01:07.063000-04:00"
        }
    ],
    "clusterName": "ex-eks-managed-node-group",
    "principalArn": "arn:aws:iam::000000000000:role/ex-single"
}

bryantbiggs avatar Mar 12 '24 15:03 bryantbiggs

Somehow the policy does not get attached in my case and in @cweiblen 's case as well. Not sure whether it is the policy that we are using. I have shared my code, plan and a screenshot above

deshruch avatar Mar 12 '24 15:03 deshruch

I have shared my code, plan and a screenshot above

You have shared some code, yes, but its all variables and values that are unknown to anyone but yourself. For now I am putting a pin in this thread because I am not seeing any issues on the module as it stands. If there is additional information that will highlight this issue, we can definitely take a another look

bryantbiggs avatar Mar 12 '24 15:03 bryantbiggs

We faced the same issue that @deshruch mentioned. e.g.

1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
2/ If it does create the entry, it does not attach the policy.

and we have this enable_cluster_creator_admin_permissions set as False

Exact error was: creating EKS Access Entry (): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: xxx, ResourceInUseException: The specified access entry resource is already in use on this cluster

We had to manually intervene and delete that entry or attach policy.

bilalahmad99 avatar Mar 15 '24 18:03 bilalahmad99

We had to do the same thing that @cweiblen did to get around this. We had to create access entries using 'resource'. Note that this was the case for a custom IAM role that we were migrating form Config Map to EKS access entry.

However, if this is for the node group role, EKS module automatically moves it. We were also using the 'karpenter' module in which you need to explicitly set create_access_entry = false (default is true), s o that the karpenter module does not try to recreate it again and throw the 'the specified access entry resource is already in use on this cluster' error.

For user defined/custom IAM role, we had to add access entry and policy association using 'resource'

deshruch avatar Mar 15 '24 18:03 deshruch

In case anyone needs to import the existing access entry:

$ terraform import 'module.cluster_name.module.eks.aws_eks_access_entry.this["cluster_creator"]' cluster_name:principal_arn
$ terraform import 'module.cluster_name.module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' cluster_name#principal_arn#policy_arn

mconigliaro avatar Apr 02 '24 18:04 mconigliaro

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar May 03 '24 00:05 github-actions[bot]

This issue was automatically closed because of stale in 10 days

github-actions[bot] avatar May 13 '24 00:05 github-actions[bot]

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jun 12 '24 02:06 github-actions[bot]