terraform-aws-eks icon indicating copy to clipboard operation
terraform-aws-eks copied to clipboard

Access Entries are deleted before Helm charts so helm charts stays orphan in the terraform state

Open skursadk opened this issue 1 year ago β€’ 3 comments

Description

We deploy a couple of helm charts inside the same terraform state where the eks module is called. It works like a charm with access entries while creating the resources but it cannot delete the helm resources properly in terraform destroy. The reason is that terraform destroy deletes the access entries before the helm charts.

It was working great in version 19.21.0 with aws-auth configmap

  • [x ] βœ‹ I have searched the open/closed issues and my issue is not listed.

⚠️ Note

Versions

  • Module version [Required]: 20.2.1

  • Terraform version:1.6.6

  • Provider version(s): 1.6.6

Reproduction Code [Required]

Steps to reproduce the behavior:

  • Create the eks cluster without giving any access entries.
  • Deploy a dummy helm chart
  • Run terrafrom destroy

Expected behavior

Terraform destroy should run successfully

Actual behavior

It deletes the access entries and cannot delete the helm charts so helm charts stays idle at the terraform sate

skursadk avatar Feb 12 '24 11:02 skursadk

It was working great in version 19.21.0 with aws-auth configmap

Thats because the cluster creator was granted admin permissions automatically that were outside of Terraform's control. I'm open to ideas but I currently don't see a way to achieve the same.

Also, the guidance will always be - destroy the resources on the cluster first before destroying the cluster. There are a number of reasons for this

bryantbiggs avatar Feb 12 '24 12:02 bryantbiggs

Should I add explicit depends_on = [ module.eks] on the resources? Is that only way to let terraform remove the helm charts and then the eks cluster?

skursadk avatar Feb 12 '24 13:02 skursadk

you can try, but if its a module then it will face challenges

you are better off separating your infrastructure from your applications. this would be two different statefiles, and you would need to explicitly handle the removal of the applications running on the cluster first, before destroying the cluster

bryantbiggs avatar Feb 12 '24 13:02 bryantbiggs

I have the same challenge.

I use terraform-aws-eks-blueprints-addons module to install addons like karpenter. All managed within the same state file.

When I perform a terraform destroy, it may fail depending if the access entry is deleted before or after the helm deployments performed by the terraform-aws-eks-blueprints-addons module.

nascit avatar Mar 05 '24 11:03 nascit

@nascit you'll need to either separate the cluster infra from the app deployment (recommended route), or follow the steps we provide to carefully tear down resources in order https://aws-ia.github.io/terraform-aws-eks-blueprints/patterns/karpenter/#destroy - and with Karpenter thats even more relevant because its creating additional AWS resources outside of Terraform's purview

bryantbiggs avatar Mar 05 '24 11:03 bryantbiggs

Since the permissions are now controlled in code, you will need to handle the destroy situation a little differently as stated above. You can use the following when destroying the cluster to ensure permissions aren't removed too early:

# Necessary to avoid removing Terraform's permissions too soon before its finished
# cleaning up the resources it deployed inside the cluster
terraform state rm 'module.eks.aws_eks_access_entry.this["cluster_creator"]' || true
terraform state rm 'module.eks.aws_eks_access_policy_association.this["cluster_creator_admin"]' || true

bryantbiggs avatar Mar 12 '24 11:03 bryantbiggs

hi @bryantbiggs ,

This issue can be avoided with authentication_mode set to CONFIG_MAP, but because bootstrap_cluster_creator_admin_permissions is hardcoded to false; the provider raise error during apply: bootstrapClusterCreatorAdminPermission must be true when authentication_mode is set to CONFIG_MAP.

Can this module allow us to have the flexibility to use CONFIG_MAP? Time is needed to adopt API_AND_CONFIG_MAP...

gohmc avatar Mar 13 '24 11:03 gohmc

Can this module allow us to have the flexibility to use CONFIG_MAP? Time is needed to adopt API_AND_CONFIG_MAP...

What in this module is stopping you from doing this?

bryantbiggs avatar Mar 13 '24 11:03 bryantbiggs

Using the built-in example here:

# Comment this line to not using EKS access management control yet. default to false.
# enable_cluster_creator_admin_permissions = true

# Add this line for the cluster to use CONFIG_MAP authentication mode only.
authentication_mode = CONFIG_MAP

Now perform terraform apply and it will fail with bootstrapClusterCreatorAdminPermission must be true when authentication_mode is set to CONFIG_MAP.

gohmc avatar Mar 13 '24 11:03 gohmc

@bryantbiggs One thing that occurs to me that might help with this, would be to use depends_on in the output for the cluster_endpoint, forcing it to wait until the access entries and associations are complete. Since the helm provider requires the cluster endpoint as an input, that should get the order of operations correct on both create and destroy.

output "cluster_endpoint" {
  description = "Endpoint for your Kubernetes API server"
  value       = try(aws_eks_cluster.this[0].endpoint, null)

  depends_on = [
    aws_eks_access_entry.this,
    aws_eks_access_policy_association.this,
  ]
}

Edit: Perhaps do the same for the cluster_certificate_authority_data:

output "cluster_certificate_authority_data" {
  description = "Base64 encoded certificate data required to communicate with the cluster"
  value       = try(aws_eks_cluster.this[0].certificate_authority[0].data, null)

  depends_on = [
    aws_eks_access_entry.this,
    aws_eks_access_policy_association.this,
  ]
}

lorengordon avatar Apr 08 '24 15:04 lorengordon

I don't think those are valid - have you tested those?

bryantbiggs avatar Apr 08 '24 18:04 bryantbiggs

@bryantbiggs Yes absolutely, using depends_on in an output is 100% valid, and used for exactly this kind of timing issue where an output for a single resource isn't fully available until some other resource completes. S3 bucket and bucket policy is another common one. Or IAM role and role attachments.

lorengordon avatar Apr 08 '24 18:04 lorengordon

They have a doc on it now also: https://developer.hashicorp.com/terraform/language/values/outputs#depends_on-explicit-output-dependencies

lorengordon avatar Apr 08 '24 18:04 lorengordon

Fyi, I am testing a PR to see if I can cleanly apply and destroy the "complete" test on the blueprints-addons project without manually using state rm and target... So far, so good! 🀞

lorengordon avatar Apr 08 '24 20:04 lorengordon

just validated this on the karpenter example which I think is more representative, and it was able to apply then destroy without targets or intervention! Great find - please feel free to open a PR to add these, I don't want to steal your thunder on the great find πŸ˜‰

bryantbiggs avatar Apr 08 '24 21:04 bryantbiggs

@bryantbiggs Done, see https://github.com/terraform-aws-modules/terraform-aws-eks/pull/3000! (And I got the 3000th pr, woot!)

lorengordon avatar Apr 08 '24 21:04 lorengordon

dang, thats a lot of PRs πŸ˜…

bryantbiggs avatar Apr 08 '24 21:04 bryantbiggs

This issue has been resolved in version 20.8.5 :tada:

antonbabenko avatar Apr 08 '24 22:04 antonbabenko

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar May 09 '24 02:05 github-actions[bot]