terraform-aws-eks-cluster
terraform-aws-eks-cluster copied to clipboard
Unable to add additional iam roles to cluster
Found a bug? Maybe our Slack Community can help.
Describe the Bug
When using the module to create an EKS cluster, I'm trying to add additional roles to the aws_auth
configmap. This only happens with the roles, adding additional users works perfectly.
The behavior changes depending on the kubernetes_config_map_ignore_role_changes
config.
- If I leave it as default (false), the worker roles are added, but not my additional roles.
- If I change it to true, the worker roles are removed and my additional roles are added.
Expected Behavior
When adding map_additional_iam_roles
, those roles should appear on the aws_auth configmap, together with the worker roles when the kubernetes_config_map_ignore_role_changes
is set to false
.
Steps to Reproduce
Steps to reproduce the behavior:
- Create a cluster and workers with this config (there are some vars not changed, so it's easy to use):
module "eks_cluster" {
region = var.region
source = "cloudposse/eks-cluster/aws"
version = "2.2.0"
name = var.name
vpc_id = var.vpc_id
subnet_ids = var.private_subnet_ids
endpoint_public_access = false
endpoint_private_access = true
kubernetes_version = var.kubernetes_version
kube_exec_auth_enabled = true
kubernetes_config_map_ignore_role_changes = false
map_additional_iam_roles = [
{
rolearn = "arn:aws:iam::<account-id>:role/myRole"
username = "added-role"
groups = ["system:masters"]
}
]
map_additional_iam_users = [
{
userarn = "arn:aws:iam::<account-id>:user/myUser"
username = "added-user"
groups = ["system:masters"]
}
]
}
module "eks_node_group" {
source = "cloudposse/eks-node-group/aws"
version = "2.4.0"
ami_release_version = [var.ami_release_version]
instance_types = [var.instance_type]
subnet_ids = var.private_subnet_ids
min_size = var.min_size
max_size = var.max_size
desired_size = var.desired_size
cluster_name = module.eks_cluster.eks_cluster_id
name = var.node_group_name
create_before_destroy = true
kubernetes_version = [var.kubernetes_version]
cluster_autoscaler_enabled = var.autoscaling_policies_enabled
module_depends_on = [module.eks_cluster.kubernetes_config_map_id]
}
Screenshots
This example shows when I change the variable kubernetes_config_map_ignore_role_changes
from false to true.
Also, I don’t see a difference in the map roles in the data block for both options except for the quoting.
Environment (please complete the following information):
Anything that will help us triage the bug will help. Here are some ideas:
- EKS version 1.22
- Module version 2.2.0
The kubernetes_config_map_ignore_role_changes
setting is a bit of a hack. Terraform does not allow parameterization of the lifecycle
block, so to give you the option of ignoring changes, we implement the resource twice, once with ignore_changes
and once without.
If you set kubernetes_config_map_ignore_role_changes
to true
, then the expected behavior is that you will not be able to add or change Roles later, because you have told Terraform to ignore changes to the Roles. However, if you change the setting from false
to true
, the old ConfigMap is deleted and new one is created, so the first time you will get your roles added.
I’m attaching an example when I change the variable (from false to true).
The screen shot you provided looks like what would happen if you change kubernetes_config_map_ignore_role_changes
from true
to false
and it looks like it includes the Roles you want. So I think you may just be understandably confused by Terraform's and this module's confusing behavior around how Role changes get ignored. I suggest you test it a little more with kubernetes_config_map_ignore_role_changes
left at false
(which is the only setting that will let you update the Roles after first creating the cluster) and report back if you still have questions.
Thanks for your answer.
My concern here is that, in the case that I let it to false, the worker roles will disappear, even though in the auth.tf
file it’s concatenating them with:
https://github.com/cloudposse/terraform-aws-eks-cluster/blob/dc373e09c124570582be670e5a1b618e166f9b1a/auth.tf#L148
This is why I’m more confused. I would love to manage all the authentication with terraform, but without breaking the regular configuration.
I kind of agree this is a bug because of:
- if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
- if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups
So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.
Example:
data "kubernetes_config_map" "existing_aws_auth" {
metadata {
name = "aws-auth"
namespace = "kube-system"
}
depends_on = [null_resource.wait_for_cluster[0]]
}
locals {
map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {})
}
resource "kubernetes_config_map" "aws_auth" {
......
mapRoles = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)
....
Also I don't recommend using the new kubernetes_config_map_v1_data
. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.
@sebastianmacarescu I think that's an amazing option!
The only thing that I see it could happen, is when deleting roles. As they'd be already on the configmap, if you remove them from the map_additional_iam_roles
, they won't be deleted as they'd be inside map_node_roles
.
I definitely prefer to have to remove them manually but having the ability to manage them with terraform than having to add them by hand.
@jchanam
If you look at the Cloud Posse EKS Terraform component (think of at this point in time as a work-in-progress), you see that the way we handle this is to output and then read the managed role ARNs to include them in future auth maps. The component uses the Cloud Pose YAML stack config module to read the state, but you can use the terraform_remote_state
data source directly.
@Nuru @jchanam I have done a working implementation that fully fixes this issue and I've tested it in production.
@Nuru can you take a look on that PR and let me know if the implementation looks good? If so I will also update the README file with proper documentation for the feature.
Can we get some review on the work done? It would be really great to have this merged
I kind of agree this is a bug because of:
- if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
- if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups
So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.
Example:
data "kubernetes_config_map" "existing_aws_auth" { metadata { name = "aws-auth" namespace = "kube-system" } depends_on = [null_resource.wait_for_cluster[0]] } locals { map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {}) } resource "kubernetes_config_map" "aws_auth" { ...... mapRoles = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote) ....
Also I don't recommend using the new
kubernetes_config_map_v1_data
. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.
What is the issue with moving to kubernetes_config_map_v1_data
?