terraform-provider-tfe icon indicating copy to clipboard operation
terraform-provider-tfe copied to clipboard

feature_request: Deleting workspace should give option to trigger destroy plan

Open 3h4x opened this issue 4 years ago • 6 comments

Hey We use workspaces dynamically. Removing them should be normal behaviour and all resources created by such workspace should be scheduled for deletion with confirmation or auto destroyed. Currently before removing workspace from our configuration we need to manually go to it and trigger manual destroy and then delete the workspace from tf configuration. Ideally there would be a flag in tfe_workspace which would allow destruction (like force_destroy in aws s3_bucket)

3h4x avatar Jul 24 '20 16:07 3h4x

Along these same lines, if the workspace is being deleted it would be preferable to have the provider verify that the state is empty first. This is already implemented in the CLI when deleting a workspace, the workspace state must be empty or you have to allow for a force delete of the workspace.

https://www.terraform.io/docs/commands/workspace/delete.html

It would be nice if this provider followed those same patterns. I would suggest a property of force_delete or allow_force_delete with a default value of false. If the provider attempts to delete a workspace which doesn't have an empty state and this was not set on the tfe_workspace resource the delete operation will fail will a message indicating the workspace can not be deleted until the state is empty.

ritzz32 avatar Jan 07 '21 19:01 ritzz32

Sadly this functionality of "first destroy the infrastructure and then the workspace itself" (which seems like a neat feature to me) should probably be some sort of platform feature and not possible solely from this provider, at least without jumping through some questionable hoops. Any attempt now to automatically queue a destroy plan before destroying the workspace would require action on that separate plan (which is likely totally out of band, the workspace's infrastructure may or may not be a part of the same configuration).

It may be possible to enable auto apply for that workspace (if it isn't already) to do this before destroying it, but again, you're triggering what essentially is an out-of-band process, hoping it finishes, and then taking another action.

Leaving this open for now though, as I can definitely see the appeal of being able to remove a workspace without dangling resources in some way, if it's feasible.


Aside: Note that CLI workspaces are regrettably not synonymous with Terraform Cloud workspaces, just to be clear (but the same idea of a TFC workspace being required to be empty first is still a good one to explore!)

chrisarcand avatar Jan 08 '21 04:01 chrisarcand

This is coming back up for me as well. Its one of those "you can shoot yourself in the foot" capabilities. Have there been any thoughts on a "auto_destroy" resource flag and appropriate warnings throughout the plan. Maybe even requiring a list of "non_destroyable" workspaces that require the managing workspace hosting the tfe/provider code to ensure it cannot be auto-destroyed via the provider? I'm looking to better understand the feasibility and road blocks of such a solution.

devhulk avatar Nov 29 '21 16:11 devhulk

Definitely appreciate the aforementioned design concerns.

But for those of us who would still prefer to find solace in having complete control over the lifecycle, and are brave (naive) enough to believe we'll always manage to keep our workspaces entirely self-contained...

I've managed to cobble together a working proof-of-concept. At a high level, it essentially involves a null_resource with a destroy-time provisioner that invokes the Terraform Cloud API to orchestrate the sync. As it stands, the solution has a couple of baked-in assumptions, such as:

  • the availability of a TFE_TOKEN environment variable (which is technically only one of two authentication approaches in the hashicorp/tfe provider docs); and,
  • the availability of concurrent runs (which, as I understand, is currently only available with the Business tier of TFC), which is needed to avoid a deadlock between the managing workspace and the managed workspace; without concurrency, the solution will detect a max concurrency of 1, and then immediately halt with a warning message asking the user to manually run a destroy plan before re-attempting to destroy the workspace.

But even with these assumptions, perhaps someone might find it useful? If so, I'll post it up when I find a bit of spare time tomorrow.

theipster avatar Jan 29 '22 05:01 theipster

@theipster I have a use case for this if you still have that example.

adrianord avatar May 25 '22 17:05 adrianord

In pseudo-code:

resource "null_resource" "teardown" {
  provisioner "local-exec" {
    when    = destroy
    command = <<-EOT
      set -e

      jq() { ... } # or something similar for extracting JSON fields

      tfc() {
        method=$1
        uri=$2
        shift 2

        curl --header "Authorization: Bearer $TFE_TOKEN" \
          --header "Content-Type: application/vnd.api+json" \
          --request $method \
          --silent \
          https://app.terraform.io$uri "$@"
      }

      # Get exclusive lock on workspace (see https://www.terraform.io/cloud-docs/api-docs/workspaces#lock-a-workspace)
      lock=`tfc POST /api/v2/workspaces/${self.triggers.workspace_id}/actions/lock \
        ...
        --write-out "%%{http_code}"`
      if [ "$lock" = "404" ]; then
        echo "Workspace is already deleted, skipping."
        exit 0
      fi

      # Get workspace
      tfc GET /api/v2/workspaces/${self.triggers.workspace_id} \
        --output workspace.json

      # No resources to destroy?
      resource_count=`jq workspace.json .data.attributes.resource-count`
      if [ "$resource_count" = "0" ]; then
        echo "Workspace is safe to destroy."
        exit 0
      fi

      # Check for concurrency (only available on paid tiers?)
      organization_name=`jq workspace.json .data.relationships.organization.data.id`
      tfc GET /api/v2/organizations/$organization_name/subscription \
        --output subscription.json
      max_run_concurrency=`jq subscription.json .data.attributes.runs-ceiling`
      if [ "$max_run_concurrency" = "1" ]; then

        # Unlock first, so that manual deletion is possible
        tfc POST /api/v2/workspaces/${self.triggers.workspace_id}/actions/unlock

        # Fail and tell user to manually delete resources
        echo "No run concurrency available. Please destroy the workspace's resources manually first."
        exit 1
      fi

      # Queue a destroy plan (see https://www.terraform.io/cloud-docs/api-docs/run#create-a-run)
      cat > destroy.json <<-PAYLOAD
      {
        "data": {
          "attributes": {
            "auto-apply": true,
            "is-destroy": true,
            ...
          },
          ...
        }
      }
      PAYLOAD
      tfc POST /api/v2/runs \
        --data @destroy.json \
        --output run.json
      run_id=`jq run.json .data.id`

      # Optimistically poll for results
      apply_status="unknown"
      while [ "$apply_status" != "finished" ]; do
        sleep 5s

        tfc GET /api/v2/runs/$run_id/apply \
          --output apply.json
        apply_status=`jq apply.json .data.attributes.status`
      done

      # Done
      echo "Workspace is safe to destroy."
      exit 0

      EOT
  }

  triggers = {
    workspace_id = var.workspace_id
  }
}

Happy to hear your feedback if this works for you, @adrianord.

I'm also happy to get more involved (pull request?) if someone can provide some guidance and direction on where/how.

theipster avatar Jun 04 '22 17:06 theipster

Good news! tfe_workspace has supported safe delete (and force delete) since version v0.39.0

brandonc avatar Dec 21 '22 22:12 brandonc

@brandonc Did you mean to close this? I don't think safe delete / force delete covers what this issue is asking for. (Please forgive me if I've misunderstood the Terraform Cloud API docs.)

Given a TFC workspace that contains 1+ resources under management:

  • safe delete will halt (HTTP 409) because there are still resources;
  • force delete will delete the workspace but will not touch the resources at all (therefore leaving them orphaned - not ideal, of course);
  • what this issue is asking for: destroy the resources and then subsequently delete the workspace (i.e. no resultant orphaned resources).

Our organisation's use case is for managing CI infrastructure, where we want to intentionally tear down the full stack (resources and workspace) via one single action (e.g. one single CI pipeline step).

We don't want the two-step process of jumping into the workspace itself to run terraform destroy on the infrastructure resources (first step) before switching back to the workspace-management workspace to run terraform apply on the tfe_workspace resources (second step).

theipster avatar Dec 22 '22 20:12 theipster

@theipster You're correct, I didn't adequately address that aspect of this request. Because of the many out of band concerns with queueing a destroy run (the disposition of the workspace to actually run and apply a destroy plan) I don't know if this kind of logical destruction has a place as a step within the tfe_workspace resource destruction. I think it is intriguing to consider a platform delete action that does this step, in which case we could make that available through the provider.

However, I think you can purpose this logical provider to run a destroy plan alongside your workspace destroy plan. Here's how I did that:

terraform {
  required_providers {
    multispace = {
      source = "mitchellh/multispace"
    }
  }
}

provider "tfe" {
  hostname = "XXX"
  token = "bcroft"
}

provider "multispace" {
  hostname = "XXX"
  token = "bcroft"
}

resource "tfe_organization" "foo" {
  email = "[email protected]"
  name = "example-workspace-destroy-plan"
}

resource "tfe_workspace" "foo" {
  organization = tfe_organization.foo.id
  name = "example-workspace-destroy-plan"
}

resource "multispace_run" "core" {
  organization = tfe_organization.foo.id
  workspace    = tfe_workspace.foo.name
}

The only oddness is that the multispace_run resource requires that the workspace have a configuration version before it can be created, which means that you probably can't create both resources in the same apply.

The only other side effect should be that a run that is created when the resource is introduced.

brandonc avatar Dec 22 '22 21:12 brandonc