terraform-aws-eks
terraform-aws-eks copied to clipboard
Creating an access entry fails if it already exists
Description
I am trying to create a new access entry. I am migrating from 19.20 -> 20.5.0 and so getting rid of config map entry and migrating to access entry: Creation of access entry fails if it already exists. I have to go manually delete the role so it attempts to create it again. See Actual Behaviour for a full error message Also for user defined roles such as the 'cluster_management_role' as shown in the terraform code - it sometimes fails to attach the policy. This results in failed deployment for us since we are using this role to for EKSTokenAuth.
- [ yes] β I have searched the open/closed issues and my issue is not listed.
β οΈ Note
Versions
-
Module version [Required]: 20.5.0
-
Terraform version: 1.5.7
-
Provider version(s): 5.38.0
Reproduction Code [Required]
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.5.0"
cluster_name = var.eks_cluster_name
cluster_version = var.eks_version
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
cluster_endpoint_public_access_cidrs = var.public_access_cidrs
enable_irsa = true
iam_role_arn = aws_iam_role.eks_cluster_role.arn
authentication_mode = "API_AND_CONFIG_MAP"
vpc_id = local.vpc_id
control_plane_subnet_ids = local.eks_cluster_private_subnets
subnet_ids = local.eks_worker_private_subnets
cluster_security_group_tags = {
"kubernetes.io/cluster/${var.eks_cluster_name}" = null
}
cluster_addons = {
vpc-cni = {
resolve_conflicts_on_update = "OVERWRITE"
resolve_conflicts_on_create = "OVERWRITE"
before_compute = true
service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
addon_version = local.eks_managed_add_on_versions.vpc_cni
configuration_values = jsonencode({
env = {
# AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
# ENI_CONFIG_LABEL_DEF = "topology.kubernetes.io/zone"
# Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
ENABLE_PREFIX_DELEGATION = "true"
WARM_PREFIX_TARGET = "1"
}
})
}
coredns = {
resolve_conflicts_on_update = "OVERWRITE"
resolve_conflicts_on_create = "OVERWRITE"
preserve = true #this is the default value
addon_version = local.eks_managed_add_on_versions.coredns
timeouts = {
create = "25m"
delete = "10m"
}
}
kube-proxy = {
addon_version = local.eks_managed_add_on_versions.kube_proxy
resolve_conflicts_on_update = "OVERWRITE"
resolve_conflicts_on_create = "OVERWRITE"
}
aws-ebs-csi-driver = {
addon_version = local.eks_managed_add_on_versions.aws_ebs_csi_driver
resolve_conflicts_on_update = "OVERWRITE"
resolve_conflicts_on_create = "OVERWRITE"
service_account_role_arn = aws_iam_role.ebs_csi_role.arn
}
}
enable_cluster_creator_admin_permissions = true
access_entries = {
cluster_manager = {
kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
principal_arn = aws_iam_role.cluster_management_role.arn
policy_associations = {
cluster_manager = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
namespaces = []
type = "cluster"
}
}
}
}
mwaa = {
kubernetes_groups = []
principal_arn = aws_iam_role.mwaa_execution_role.arn
username = "mwaa-service"
}
}
node_security_group_additional_rules = {
nodes_istiod_port = {
description = "Cluster API to Node group for istiod webhook"
protocol = "tcp"
from_port = 15017
to_port = 15017
type = "ingress"
source_cluster_security_group = true
}
node_to_node_communication = {
description = "Allow full access for cross-node communication"
protocol = "tcp"
from_port = 0
to_port = 65535
type = "ingress"
self = true
}
}
node_security_group_tags = {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery" = var.eks_cluster_name
}
eks_managed_node_group_defaults = {
# We are using the IRSA created below for permissions
# However, we have to provision a new cluster with the policy attached FIRST
# before we can disable. Without this initial policy,
# the VPC CNI fails to assign IPs and nodes cannot join the new cluster
iam_role_attach_cni_policy = true
}
eks_managed_node_groups = {
default = {
name = "${var.eks_cluster_name}-default"
subnet_ids = local.eks_worker_private_subnets
min_size = 2
max_size = 3
desired_size = 2
force_update_version = true
instance_types = ["m5a.xlarge"]
# Not required nor used - avoid tagging two security groups with same tag as well
create_security_group = false
update_config = {
max_unavailable_percentage = 50 # or set `max_unavailable`
}
description = "${var.eks_cluster_name} - EKS managed node group launch template"
ebs_optimized = true
disable_api_termination = false
enable_monitoring = true
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 75
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
}
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
instance_metadata_tags = "disabled"
}
create_iam_role = false
iam_role_arn = aws_iam_role.eks_node_group_role.arn
# iam_role_name = "${var.eks_cluster_name}-default-managed-node-group"
# iam_role_use_name_prefix = false
# iam_role_description = "EKS managed node group role"
# iam_role_additional_policies = {
# AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
# additional = aws_iam_policy.node_additional.arn
# }
tags = {
EksClusterName = var.eks_cluster_name
}
}
}
tags = {
# Explicit `nonsensitive()` call needed here as these tags are used in a foreach loop during deployment and foreach don't allow sensitive value
nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_key.value) = nonsensitive(data.aws_ssm_parameter.appregistry_application_tag_value.value)
VPC_Name = var.vpc_name
Terraform = "true"
}
}
Steps to reproduce the behavior: terraform init terraform apply
Expected behavior
The cluster entry should be properly created even if it already exists. Policy should be attached correctly
Actual behavior
The behaviour is very intermittent and unpredictable. It sometimes creates tge We see error messages such as:
β·
β Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/aws-reserved/sso.amazonaws.com/AWSReservedSSO_AdministratorAccess_2dfe39b46fb1ea3a): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 06e2b43a-e5a6-46f6-a05f-ed8b0887aa75, ResourceInUseException: The specified access entry resource is already in use on this cluster.
β
β with module.eks.aws_eks_access_entry.this["cluster_creator"],
β on .terraform/modules/eks/main.tf line 185, in resource "aws_eks_access_entry" "this":
β 185: resource "aws_eks_access_entry" "this" {
β
β΅
β·
β Error: creating EKS Access Entry (second:arn:aws:iam::473699735501:role/second-us-east-1-eks-node-group-role): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: 7f43c24f-361e-46cc-84e9-fe642dc622e0, ResourceInUseException: The specified access entry resource is already in use on this cluster.
β
β with module.karpenter.aws_eks_access_entry.node[0],
β on .terraform/modules/karpenter/modules/karpenter/main.tf line 589, in resource "aws_eks_access_entry" "node":
β 589: resource "aws_eks_access_entry" "node" {
β
β΅
make: *** [Makefile:142: deploy-eks-cluster] Error 1
Actual behaviour when cluster_management_role custom role access entry fails to attach the policy Plan: 18 to add, 2 to change, 13 to destroy.
β·
β Error: query: failed to query with labels: secrets is forbidden: User "arn:aws:sts::473699735501:assumed-role/eks-second-us-east-1-cluster-management-role/EKSGetTokenAuth" cannot list resource "secrets" in API group "" in the namespace "karpenter"
β
β with helm_release.karpenter,
β on eks-add-ons.tf line 101, in resource "helm_release" "karpenter":
β 101: resource "helm_release" "karpenter" {
β
β΅
Terminal Output Screenshot(s)
Additional context
@cweiblen Are you able to reproduce this issue as well?
if you are migrating a cluster into cluster access entry, you can't use enable_cluster_creator_admin_permissions = true
because EKS automatically maps that entity into an access entry. you can either remove this, or you can enable it but you'll need to import the entry that EKS created into the resource used by the module (to control this via Terraform)
bryantbiggs For the other issue where the policy is never attached:
cluster_manager = {
kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
principal_arn = aws_iam_role.cluster_management_role.arn
policy_associations = {
cluster_manager = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
namespaces = []
type = "cluster"
}
}
}
}
I see this in the plan:
# module.eks.aws_eks_access_entry.this["cluster_manager"] will be created
+ resource "aws_eks_access_entry" "this" {
+ access_entry_arn = (known after apply)
+ cluster_name = "osdu5"
+ created_at = (known after apply)
+ id = (known after apply)
+ kubernetes_groups = (known after apply)
+ modified_at = (known after apply)
+ principal_arn = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"
+ tags = {
+ "Terraform" = "true"
+ "VPC_Name" = "osdu5"
}
+ tags_all = {
+ "Terraform" = "true"
+ "VPC_Name" = "osdu5"
}
+ type = "STANDARD"
+ user_name = (known after apply)
}
# module.eks.aws_eks_access_policy_association.this["cluster_manager_cluster_manager"] will be created
+ resource "aws_eks_access_policy_association" "this" {
+ associated_at = (known after apply)
+ cluster_name = "osdu5"
+ id = (known after apply)
+ modified_at = (known after apply)
+ policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
+ principal_arn = "arn:aws:iam::808560345837:role/eks-osdu5-us-east-1-cluster-management-role"
+ access_scope {
+ type = "cluster"
}
}
Is this related with https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2958? If yes, what change do I need to make in my terraform code?
I don't follow, what is the issue?
In the reproduction code, see my access entry for principal_arn = aws_iam_role.cluster_management_role.arn After terraform is applied, the access entry is created but it does not have the AmazonEKSClusterAdminPolicy attached to it.
See the second entry here:
- What does the API say
aws eks list-associated-access-policies --cluster-name <value> --principal-arn <value>
- Is your Terraform plan "clean" (i.e. - if you run
terraform plan
, its free of any diff/pending changes)
Migrating an existing cluster from 19.20 -> 20.2, I was not able to get it working using access_entries
input, I would get the errors described. As a workaround I used the aws_eks_access_entry from the AWS provider
I would be curious to see what you are doing differently. If an access entry already exists, it already exists - there isn't anything unique about the implementation that would allow you to get around that
@bryantbiggs There are 2 issues that we see. when using access_entries 1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning 2/ If it does create the entry, it does not attach the policy.
I plan to attempt the same thing as @cweiblen mentioned. Move it out of eks module and add a separate access entry 'resource'
1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module
2/ If it does create the entry, it does not attach the policy.
Do you have a reproduction? I'd love to see whats different about a standalone resource versus whats defined here. Here is what we have in our example that works as intended https://github.com/terraform-aws-modules/terraform-aws-eks/blob/907f70cffdd03e14d1da97d916451cfb0688a760/examples/eks_managed_node_group/main.tf#L304-L342
@bryantbiggs In my code I have an access entry of type 'cluster' as shown below:
In your example, ex-two is of type cluster but no 'policy_associations' section only a policy_arn. Is that may be the problem with my code?
access_entries = {
cluster_manager = {
kubernetes_groups = [] #did not allow to add to system:masters, associating admin access policy
principal_arn = aws_iam_role.cluster_management_role.arn
policy_associations = {
cluster_manager = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
namespaces = []
type = "cluster"
}
}
}
}
mwaa = {
kubernetes_groups = []
principal_arn = aws_iam_role.mwaa_execution_role.arn
username = "mwaa-service"
}
Can you post an example of ex-single of type cluster with a policy association/policy_arn? Probably the syntaxes are wrong?
Reg:
1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
We do not control this - this is the EKS API. Its stating that you can't have more than one entry for the same principal. This would be similar to trying to create two clusters both named the same, in the same region - the API does not allow that, nothing to do with this module
This is a problem when we are doing an upgrade. The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again. It should simply ignore if it already exists. But as you are saying its the EKS API and we need to log an issue there.
The first time we run it works fine, the second time you run it may be for an upgrade in another part of the code - it attempts to create it again
From the details you have provided, its very hard to understand what you are doing and why you are encountering issues. I would suggest re-reading the upgrade guide. In short, there are two areas where access entries will already exist that YOU do not need to re-add them in code. Both of these scenarios are when you have a cluster that was created with the aws-auth
ConfigMap and you are migrating to access entry:
- The identity that was used to create the cluster will automatically be mapped into an access entry when access entry is enabled on a cluster. Under the
aws-auth
ConfigMap only method, you would not see this identity in the ConfigMap. If you are using the same role that was used to create the cluster usingaws-auth
and you are migrating to access entry, you should not setenable_cluster_creator_admin_permissions = true
because Terraform will try to create an access entry that EKS has already created and it will fail. If you wish to control this in code you will either need to manually delete the entry via the EKS API and then create with Terraform, or do a Terraform import to control this through code. We cannot do anything about this in the module since the module did not create it in the first place - EKS will automatically create access entries for roles used by EKS managed nodegroup(s) and EKS Fargate profiles - users should NOT do anything with these cluster access entries when migrating to cluster access entry - leave these to EKS to manage. Again, if you try to re-add these entries through code/Terraform, it will fail and state that an entry already exists
and for sake of completeness, here is an example as requested of a single entry with cluster scope as the module is currently written - it works without issue:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.8"
cluster_name = local.name
cluster_version = local.cluster_version
cluster_endpoint_public_access = true
enable_cluster_creator_admin_permissions = true
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
control_plane_subnet_ids = module.vpc.intra_subnets
eks_managed_node_group_defaults = {
ami_type = "AL2_x86_64"
instance_types = ["m6i.large", "m5.large", "m5n.large", "m5zn.large"]
}
eks_managed_node_groups = {
# Default node group - as provided by AWS EKS
default_node_group = {
# By default, the module creates a launch template to ensure tags are propagated to instances, etc.,
# so we need to disable it to use the default template provided by the AWS EKS managed node group service
use_custom_launch_template = false
}
}
access_entries = {
# One access entry with a policy associated
ex-single = {
principal_arn = aws_iam_role.this["single"].arn
policy_associations = {
ex = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}
tags = local.tags
}
Describe the access entry:
aws eks describe-access-entry \
--cluster-name ex-eks-managed-node-group \
--principal-arn "arn:aws:iam::000000000000:role/ex-single" \
--region eu-west-1
{
"accessEntry": {
"clusterName": "ex-eks-managed-node-group",
"principalArn": "arn:aws:iam::000000000000:role/ex-single",
"kubernetesGroups": [],
"accessEntryArn": "arn:aws:eks:eu-west-1:000000000000:access-entry/ex-eks-managed-node-group/role/000000000000/ex-single/40c71997-3891-aa1c-0997-e0352c7ca25a",
"createdAt": "2024-03-12T11:01:05.685000-04:00",
"modifiedAt": "2024-03-12T11:01:05.685000-04:00",
"tags": {
"GithubRepo": "terraform-aws-eks",
"GithubOrg": "terraform-aws-modules",
"Example": "ex-eks-managed-node-group"
},
"username": "arn:aws:sts::000000000000:assumed-role/ex-single/{{SessionName}}",
"type": "STANDARD"
}
}
List the policies associated with this principal:
aws eks list-associated-access-policies \
--cluster-name ex-eks-managed-node-group \
--principal-arn "arn:aws:iam::000000000000:role/ex-single" \
--region eu-west-1
{
"associatedAccessPolicies": [
{
"policyArn": "arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy",
"accessScope": {
"type": "cluster",
"namespaces": []
},
"associatedAt": "2024-03-12T11:01:07.063000-04:00",
"modifiedAt": "2024-03-12T11:01:07.063000-04:00"
}
],
"clusterName": "ex-eks-managed-node-group",
"principalArn": "arn:aws:iam::000000000000:role/ex-single"
}
Somehow the policy does not get attached in my case and in @cweiblen 's case as well. Not sure whether it is the policy that we are using. I have shared my code, plan and a screenshot above
I have shared my code, plan and a screenshot above
You have shared some code, yes, but its all variables and values that are unknown to anyone but yourself. For now I am putting a pin in this thread because I am not seeing any issues on the module as it stands. If there is additional information that will highlight this issue, we can definitely take a another look
We faced the same issue that @deshruch mentioned. e.g.
1/ If an access entry exists - it complains with the error 'Resource is already in use' and fails. It's a fatal error and not just a warning
2/ If it does create the entry, it does not attach the policy.
and we have this enable_cluster_creator_admin_permissions
set as False
Exact error was:
creating EKS Access Entry (): operation error EKS: CreateAccessEntry, https response error StatusCode: 409, RequestID: xxx, ResourceInUseException: The specified access entry resource is already in use on this cluster
We had to manually intervene and delete that entry or attach policy.
We had to do the same thing that @cweiblen did to get around this. We had to create access entries using 'resource'. Note that this was the case for a custom IAM role that we were migrating form Config Map to EKS access entry.
However, if this is for the node group role, EKS module automatically moves it. We were also using the 'karpenter' module in which you need to explicitly set create_access_entry = false (default is true), s o that the karpenter module does not try to recreate it again and throw the 'the specified access entry resource is already in use on this cluster' error.
For user defined/custom IAM role, we had to add access entry and policy association using 'resource'
In case anyone needs to import the existing access entry:
$ terraform import 'module.cluster_name.module.eks.aws_eks_access_entry.this["cluster_creator"]' cluster_name:principal_arn
$ terraform import 'module.cluster_name.module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' cluster_name#principal_arn#policy_arn
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
This issue was automatically closed because of stale in 10 days
I'm going to lock this issue because it has been closed for 30 days β³. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.