terraform-aws-eks-cluster icon indicating copy to clipboard operation
terraform-aws-eks-cluster copied to clipboard

Unable to add additional iam roles to cluster

Open jchanam opened this issue 2 years ago • 8 comments

Found a bug? Maybe our Slack Community can help.

Slack Community

Describe the Bug

When using the module to create an EKS cluster, I'm trying to add additional roles to the aws_auth configmap. This only happens with the roles, adding additional users works perfectly.

The behavior changes depending on the kubernetes_config_map_ignore_role_changes config.

  • If I leave it as default (false), the worker roles are added, but not my additional roles.
  • If I change it to true, the worker roles are removed and my additional roles are added.

Expected Behavior

When adding map_additional_iam_roles, those roles should appear on the aws_auth configmap, together with the worker roles when the kubernetes_config_map_ignore_role_changes is set to false.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Create a cluster and workers with this config (there are some vars not changed, so it's easy to use):
module "eks_cluster" {
  region  = var.region
  source  = "cloudposse/eks-cluster/aws"
  version = "2.2.0"

  name = var.name

  vpc_id     = var.vpc_id
  subnet_ids = var.private_subnet_ids

  endpoint_public_access  = false
  endpoint_private_access = true

  kubernetes_version = var.kubernetes_version

  kube_exec_auth_enabled = true

  kubernetes_config_map_ignore_role_changes = false
  map_additional_iam_roles = [
    {
      rolearn  = "arn:aws:iam::<account-id>:role/myRole"
      username = "added-role"
      groups   = ["system:masters"]
    }
  ]

  map_additional_iam_users = [
    {
      userarn  = "arn:aws:iam::<account-id>:user/myUser"
      username = "added-user"
      groups   = ["system:masters"]
    }
  ]
}

module "eks_node_group" {
  source  = "cloudposse/eks-node-group/aws"
  version = "2.4.0"

  ami_release_version   = [var.ami_release_version]
  instance_types        = [var.instance_type]
  subnet_ids            = var.private_subnet_ids
  min_size              = var.min_size
  max_size              = var.max_size
  desired_size          = var.desired_size
  cluster_name          = module.eks_cluster.eks_cluster_id
  name                  = var.node_group_name
  create_before_destroy = true
  kubernetes_version    = [var.kubernetes_version]

  cluster_autoscaler_enabled = var.autoscaling_policies_enabled

  module_depends_on = [module.eks_cluster.kubernetes_config_map_id]
}

Screenshots

This example shows when I change the variable kubernetes_config_map_ignore_role_changes from false to true.

2022-06-22 at 13 30

Also, I don’t see a difference in the map roles in the data block for both options except for the quoting.

2022-06-22 at 13 38

Environment (please complete the following information):

Anything that will help us triage the bug will help. Here are some ideas:

  • EKS version 1.22
  • Module version 2.2.0

jchanam avatar Jun 22 '22 15:06 jchanam

The kubernetes_config_map_ignore_role_changes setting is a bit of a hack. Terraform does not allow parameterization of the lifecycle block, so to give you the option of ignoring changes, we implement the resource twice, once with ignore_changes and once without.

If you set kubernetes_config_map_ignore_role_changes to true, then the expected behavior is that you will not be able to add or change Roles later, because you have told Terraform to ignore changes to the Roles. However, if you change the setting from false to true, the old ConfigMap is deleted and new one is created, so the first time you will get your roles added.

I’m attaching an example when I change the variable (from false to true).

The screen shot you provided looks like what would happen if you change kubernetes_config_map_ignore_role_changes from true to false and it looks like it includes the Roles you want. So I think you may just be understandably confused by Terraform's and this module's confusing behavior around how Role changes get ignored. I suggest you test it a little more with kubernetes_config_map_ignore_role_changes left at false (which is the only setting that will let you update the Roles after first creating the cluster) and report back if you still have questions.

Nuru avatar Jun 22 '22 18:06 Nuru

Thanks for your answer.

My concern here is that, in the case that I let it to false, the worker roles will disappear, even though in the auth.tf file it’s concatenating them with:

https://github.com/cloudposse/terraform-aws-eks-cluster/blob/dc373e09c124570582be670e5a1b618e166f9b1a/auth.tf#L148

This is why I’m more confused. I would love to manage all the authentication with terraform, but without breaking the regular configuration.

jchanam avatar Jun 23 '22 08:06 jchanam

I kind of agree this is a bug because of:

  • if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
  • if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups

So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.

Example:

data "kubernetes_config_map" "existing_aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  depends_on = [null_resource.wait_for_cluster[0]]
}
locals {
  map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {})
}

resource "kubernetes_config_map" "aws_auth" {
......
  mapRoles    = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)
....

Also I don't recommend using the new kubernetes_config_map_v1_data. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.

sebastianmacarescu avatar Jun 23 '22 08:06 sebastianmacarescu

@sebastianmacarescu I think that's an amazing option!

The only thing that I see it could happen, is when deleting roles. As they'd be already on the configmap, if you remove them from the map_additional_iam_roles, they won't be deleted as they'd be inside map_node_roles.

I definitely prefer to have to remove them manually but having the ability to manage them with terraform than having to add them by hand.

jchanam avatar Jun 23 '22 10:06 jchanam

@jchanam

If you look at the Cloud Posse EKS Terraform component (think of at this point in time as a work-in-progress), you see that the way we handle this is to output and then read the managed role ARNs to include them in future auth maps. The component uses the Cloud Pose YAML stack config module to read the state, but you can use the terraform_remote_state data source directly.

Nuru avatar Jun 23 '22 17:06 Nuru

@Nuru @jchanam I have done a working implementation that fully fixes this issue and I've tested it in production.

@Nuru can you take a look on that PR and let me know if the implementation looks good? If so I will also update the README file with proper documentation for the feature.

sebastianmacarescu avatar Jul 20 '22 12:07 sebastianmacarescu

Can we get some review on the work done? It would be really great to have this merged

sebastianmacarescu avatar Jan 11 '23 09:01 sebastianmacarescu

I kind of agree this is a bug because of:

  • if you set ignore role changes then any extra roles you add will be ignored so you need to update both terraform vars and manually apply the new config map - kind of bad because you lose the descriptive nature of terraform
  • if you don't ignore the role changes then this module will strictly apply what you specified in the variables and delete the existing aws auth map - also bad because you lose the roles added by eks when using managed node groups

So in both cases you lose something. I have got around this by first reading the config map (if exists) and merge with the current settings). That way terraform will manage only the roles setup in vars and will not touch any other manually added roles.

Example:

data "kubernetes_config_map" "existing_aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  depends_on = [null_resource.wait_for_cluster[0]]
}
locals {
  map_node_roles = try(yamldecode(data.kubernetes_config_map.existing_aws_auth.data["mapRoles"]), {})
}

resource "kubernetes_config_map" "aws_auth" {
......
  mapRoles    = replace(yamlencode(distinct(concat(local.map_worker_roles, local.map_node_roles, var.map_additional_iam_roles))), "\"", local.yaml_quote)
....

Also I don't recommend using the new kubernetes_config_map_v1_data. This fails because the eks managed node group changes the ownership of the mapRoles in configMap if deployed separately. After testing vpcLambda is the sole owner of it so terraform applying again resets it.

What is the issue with moving to kubernetes_config_map_v1_data?

honarkhah avatar Jul 26 '23 15:07 honarkhah