terraform-provider-helm icon indicating copy to clipboard operation
terraform-provider-helm copied to clipboard

Terraform destroy of helm_release resources.

Open Ragib95 opened this issue 4 years ago • 12 comments

Terraform Version and Provider Version

  • Terraform v0.12.26
  • provider.aws v2.65.0
  • provider.helm v1.3.0
  • provider.kubernetes v1.11.3

Provider Version

  • provider.helm v1.3.0

Affected Resource(s)

  • helm_release

Terraform Configuration Files

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

provider "helm" {
  kubernetes {
    load_config_file = false
    host             = "${aws_eks_cluster.aws_eks.endpoint}"

    cluster_ca_certificate = "${base64decode(aws_eks_cluster.aws_eks.certificate_authority.0.data)}"

    token = data.aws_eks_cluster_auth.main.token

  }
}

resource "helm_release" "nginx-ingress" {
  name             = "nginx-ingress"
  chart            = "/nginx-ingress/"
  namespace        = "opsera"
  create_namespace = true
  timeout          = 600

  values = [
    "${file("value.yaml")}"
  ]

  depends_on = [
    "aws_eks_node_group.node",
    "helm_release.cluster-autoscaler",
    "aws_acm_certificate.public_cert"
  ]
}

Debug Output

helm_release.nginx-ingress: Destroying... [id=nginx-ingress]
helm_release.nginx-ingress: Destruction complete after 8s
aws_eks_node_group.node: Destroying... [id=*****node]

Panic Output

Expected Behavior

helm_release destruction should wait for all resources (pods, services, and ingress) to be in a destroyed state before going into Destruction complete state.

Actual Behavior

It's going into Destruction complete state within 7-8 secs before pods and services are fully destroyed. This results in EKS node destruction getting started and leaves ELB attached to service.

Reason:- Before helm is releasing pods and services, terraform started deleting node and cluster leaving pods in Terminating state.

image

Steps to Reproduce

  1. Create an EKS cluster with Nginx ingress.
  2. Destroy the resources using terraform destroy
  3. It's giving timeout error as ELB attached to Nginx service is not getting destroyed.

Important Factoids

References

  • GH-1234

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Ragib95 avatar Sep 29 '20 11:09 Ragib95

@Ragib95 This is expected behaviour due to a limitation in Terraform that causes it to not recognise the implicit dependency between the Helm resource and the EKS cluster resource. Terraform tries to parallelise the destroy operations when no dependency is known between the resources. This can lead to the EKS cluster being destroyed before the Helm release itself.

I'd suggest setting an explicit dependency on the EKS cluster resource in the helm_release resource, like this:

depends_on = [
  aws_eks_cluster.aws_eks,
]

alexsomesan avatar Oct 07 '20 15:10 alexsomesan

We currently don't have a way to know what resources are created. We will have to wait for https://github.com/helm/helm/issues/2378 to be implemented.

aareet avatar Jan 06 '21 17:01 aareet

I am unable to terraform destroy -target=... a helm_release resource:

Error: uninstall: Release not loaded: metrics-server: release: not found

Is this another manifestation of this issue, or should I open a separate one?

devurandom avatar Jan 07 '21 14:01 devurandom

We currently don't have a way to know what resources are created. We will have to wait for helm/helm#2378 to be implemented.

Issue closed, but not fixed.

jocutajar avatar Sep 15 '21 09:09 jocutajar

I got same error when I tried to destroy resource with terraform. The helm release got deleted, but the pods were in "Terminating" status. And I found that all of helm chart resources got this issue.

my terraform structure: dev: call modules prod: call modules modules: all resources (included helm charts) are built in module directory

Any solution or ideas?

visla-xugeng avatar Sep 23 '21 21:09 visla-xugeng

We currently don't have a way to know what resources are created. We will have to wait for helm/helm#2378 to be implemented.

Issue closed, but not fixed.

Seems like the referenced helm issue has been fixed by https://github.com/helm/helm/pull/9702 Would it make it easier to solve this issue?

FearlessHyena avatar Oct 27 '21 15:10 FearlessHyena

@alexsomesan as mentioned earlier in this thread https://github.com/helm/helm/pull/9702 seems to solve this issue from within Helm.

Then I think it can be solved in the Terraform Helm provider by adding a new wait_for_destroy argument, that is passed to the Helm uninstall command.

Don't exactly know how to do it, but if you could point me in the right direction I could give it a try.

avinashpancham avatar Nov 22 '21 08:11 avinashpancham

Any update on the Terraform side for helm/helm#9702?

ClenchPaign avatar Dec 27 '21 11:12 ClenchPaign

I believe this was resolved by #786. After upgrading the Helm provider to 2.4, the 'wait' attribute of the helm_release is respected during terraform destroy.

jferris avatar Feb 26 '22 18:02 jferris

I believe this was resolved by #786. After upgrading the Helm provider to 2.4, the 'wait' attribute of the helm_release is respected during terraform destroy.

I think is nope.

I used 2.4.1, 2.5.0 and 2.5.1.

wait didn't fix the issue for me. (Default value is true, by the way)

RicoToothless avatar May 27 '22 06:05 RicoToothless

Hi, #786 is an impressive MR (to say the least)! I'm not brave enough to go dig into it. Do we need a test scenario for the wait on destroy?

jocutajar avatar Jul 14 '22 10:07 jocutajar

Our current workaround, which aint great but... yeah...

resource "helm_release" "nginx_ingress_controller" {
  name       = local.service_name_ingress-nginx
  namespace  = var.namespace
  repository = "https://kubernetes.github.io/ingress-nginx"
  chart      = "ingress-nginx"
  version    = "4.2.1"

  values = [
    yamlencode(local.helm_chart_ingress-nginx_values)
  ]
  
  max_history = 3
  depends_on = [
    helm_release.aws_load_balancer_controller,
    time_sleep.wait_nginx_termination
  ]
}

# Helm chart destruction will return immediately, we need to wait until the pods are fully evicted
# https://github.com/hashicorp/terraform-provider-helm/issues/593
resource "time_sleep" "wait_nginx_termination" {
  destroy_duration = "${local.ingress_nginx_terminationGracePeriodSeconds}s"
}

Putting a fixed sleep timer does the job, waiting more than necessary but does the job for now :/

WillerWasTaken avatar Sep 02 '22 15:09 WillerWasTaken

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!

github-actions[bot] avatar Sep 03 '23 00:09 github-actions[bot]