terraform Replaced resource does not create when old dependency fails to destroy

Terraform Version

$ terraform version
Terraform v1.9.2
on darwin_arm64

Terraform Configuration Files

resource "terraform_data" "new_root" { count = 1 }

resource "terraform_data" "old_root" {
  count = 1
  provisioner "local-exec" {
    when    = destroy
    command = "false"
  }
}

locals { target = try(terraform_data.old_root[0].id, null) }

resource "terraform_data" "child" {
  input            = local.target
  triggers_replace = [local.target]
}

Debug Output

Terraform Apply 1 - Create necessary resource configuration

Changes to make:

Change terraform_data.old_root count = 0 to simulate resource destroy action
Change local.target value to terraform_data.new_root to simulate re-rooting of dependency

diff --git a/main.tf b/main.tf
index b36a2bf..59c41ad 100644
--- a/main.tf
+++ b/main.tf
@@ -1,14 +1,14 @@
 resource "terraform_data" "new_root" { count = 1 }
 
 resource "terraform_data" "old_root" {
-  count = 1
+  count = 0
   provisioner "local-exec" {
     when    = destroy
     command = "false"
   }
 }
 
-locals { target = try(terraform_data.old_root[0].id, null) }
+locals { target = try(terraform_data.new_root[0].id, null) }
 
 resource "terraform_data" "child" {
   input            = local.target

Terraform Apply 2 - Demonstrate no create action when old dependency fails to destroy

Terraform Plan(ish) output

$ terraform apply
terraform_data.old_root[0]: Refreshing state... [id=be057839-2c13-f80d-ebe9-2202ccea391a]
terraform_data.new_root[0]: Refreshing state... [id=3b777302-0856-c378-1d12-d852ca6734e4]
terraform_data.child: Refreshing state... [id=390942a4-b664-9947-142f-eb3260a413c3]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  - destroy
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # terraform_data.child must be replaced
-/+ resource "terraform_data" "child" {
      ~ id               = "390942a4-b664-9947-142f-eb3260a413c3" -> (known after apply)
      ~ input            = "be057839-2c13-f80d-ebe9-2202ccea391a" -> "3b777302-0856-c378-1d12-d852ca6734e4"
      ~ output           = "be057839-2c13-f80d-ebe9-2202ccea391a" -> (known after apply)
      ~ triggers_replace = [
          ~ "be057839-2c13-f80d-ebe9-2202ccea391a" -> "3b777302-0856-c378-1d12-d852ca6734e4",
        ]
    }

  # terraform_data.old_root[0] will be destroyed
  # (because index [0] is out of range for count)
  - resource "terraform_data" "old_root" {
      - id = "be057839-2c13-f80d-ebe9-2202ccea391a" -> null
    }

Plan: 1 to add, 0 to change, 2 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

Terraform Apply output

terraform_data.child: Destroying... [id=390942a4-b664-9947-142f-eb3260a413c3]
terraform_data.child: Destruction complete after 0s
terraform_data.old_root[0]: Destroying... [id=be057839-2c13-f80d-ebe9-2202ccea391a]
terraform_data.old_root[0]: Provisioning with 'local-exec'...
terraform_data.old_root[0] (local-exec): Executing: ["/bin/sh" "-c" "false"]
╷
│ Error: local-exec provisioner error
│ 
│   with terraform_data.old_root[0],
│   on main.tf line 4, in resource "terraform_data" "old_root":
│    4:   provisioner "local-exec" {
│ 
│ Error running command 'false': exit status 1. Output: 
╵

Expected Behavior

terraform_data.child depended on terraform_data.old_root, but is switching to depend on terraform_data.new_root.

To me, and from a DAG perspective, I would expect that terraform_data.child to be destroyed, and then it should be able to be created, regardless of what happens to terraform_data.old_root.

Actual Behavior

terraform_data.child is destroyed, which is fine, but it is never created because its old dependency errors.

Steps to Reproduce

See the gists and diff deltas as outlined in the Debug Output section.

Additional Context

This is a dumbed down example of what actually happened. In reality, we experienced this with AWS Route53 Zones and Records and caused a temporary DNS outage. The root reason for the deletion error is due to IAM permission limitations.

While IAM limitations are the root issue and need to be solved, I think this also demonstrates a legitimate issue within the DAG dependency management of Terraform. Unless, is there a specific reason why to choose not to (re)create terraform_data.child when its old dependency experiences a failure?

References

I looked for a while, but I don't know how to correctly express this situation, so I wasn't able find a related issue, though they may be out there. If so, apologies for the duplicate.

Jul 19 '24 17:07 KetchupBomb

Hi @KetchupBomb, thanks for filing this! And, thanks for the easy-to-reproduce example configuration.

One thing to note is the create_before_destroy lifecycle option:

resource "terraform_data" "child" {
  // attributes...

  lifecycle {
    create_before_destroy = true
  }
}

This will make Terraform create the new resource before destroying the old one. I just wanted to share the create_before_destroy attribute, as I think it will help you avoid outages in the future. I will investigate the reasoning behind the behaviour you've highlighted, but I suspect there will be some technical reason as to why the ordering happens in the way it does. I think it's likely that the create_before_destroy attribute was introduced because of the destroy ordering behaviour you've highlighted here.

Thanks again!

Jul 22 '24 07:07 liamcervante

Thanks for the ACK, @liamcervante. I'm aware of create_before_destroy, and use it in places I know it's needed -- like certain operations on the AWS API which require something to always exist, etc.

I was more concerned with understanding whether or not the general case is purposeful. I generally want to follow default workflows (which is destroy-then-create in Terraform), and using create_before_destroy deviates from default. There are likely unbounded situations where it seems like Terraform's default should work, but it won't given the above situation.

But as long as you see the root argument I've outlined, I leave it to you guys to determine if it's purposeful or if it's a bug. I'd like to subscribe to the answer, though, so if you're willing to share when you find out, I would appreciate it. 🙏

Jul 22 '24 17:07 KetchupBomb

Hi @KetchupBomb,

The order of operations you see here is working as designed, though it is a bit of an awkward case to handle. Destroy actions are still strictly ordered in relation to the dependencies as recorded during the last apply operation, so as far as Terraform is concerned, it must try to delete old_root before replacing child. This is also meant to remain consistent with the order if there were an update to child vs replacement, the action taken on the final resource is meant to happen after the destroy has completed.

I do think a case could be made that the change to the config should break the old dependency (which it already will but only after the first failed apply and the new dependencies are stored). It is highly unlikely that the resource types are tightly coupled enough, that creating the new child might not be able to proceed if the referenced root resource still exists. The place where this does matter though, is when resources downstream from the failed destroy are also depended on by child. In that case however, the existing dependency rules would still block Terraform from proceeding all the way to the child create step, so I don't think we would need to worry about breaking configurations.

To simplify your example, going from this initial config:

resource "terraform_data" "old_root" {
}

resource "terraform_data" "child" {
  triggers_replace = terraform_data.old_root.id
}

to this config

removed {
  from = terraform_data.old_root
  provisioner "local-exec" {
    when    = destroy
    command = "false"
  }
}

resource "terraform_data" "child" {
}

Should not block creating the new child resource.

In order to do that, it would take a very specific set of conditions. The create side of a resource replacement, when create_before_destroy is not being used, would not need to depend on a destroy action other than it's own. This could be tricky to implement however, because the graph building process tries to be generalized for all combinations of operations, and detailed inspections like this are not always convenient. This is something we can look into however, both to verify correctness and for feasibility. Thanks!

Jul 22 '24 21:07 jbardin

Use ignore_changes: Add ignore_changes = [terraform_data.old_root.id] to your terraform_data.child resource definition. This tells Terraform to ignore changes in old_root.id and treat it as a constant value. This will force recreation of child regardless of old_root's destruction status.

Aug 01 '24 07:08 Rishav-Roushan-Infrrd

@Rishav-Roushan-Infrrd (and others in the future), ignore_changes is one such option to "fix" the problem, as is create_before_destroy. Indeed, the real "fix" (for us) is making sure Terraform has the appropriate IAM permissions.

The purpose of this issue was to raise the fact that, from a DAG perspective, there seemed to be a bug in refusing to create a resource when its previous parent failed to destroy. @jbardin outlined above that, while the DAG is important, Terraform also follows ordering based on what happened on the last Terraform apply:

The order of operations you see here is working as designed, though it is a bit of an awkward case to handle. Destroy actions are still strictly ordered in relation to the dependencies as recorded during the last apply operation, so as far as Terraform is concerned, it must try to delete old_root before replacing child. This is also meant to remain consistent with the order if there were an update to child vs replacement, the action taken on the final resource is meant to happen after the destroy has completed.

Using other Terraform features to mask the underlying issue only complicates the discussion. 😅 I was looking for an answer on if the behavior was a bug or intentional. It is intentional.

I leave it to Hashicorp to close/tag this issue appropriately. 👍

Aug 01 '24 20:08 KetchupBomb

Thanks for that clarification, @KetchupBomb. I'll go ahead and close this issue.

Aug 01 '24 23:08 crw

Sorry @crw, I wasn't completely clear. While it is the designed behavior, I'm going to reevaluate this particular detail. Trying to change and replace dependencies at the same time is tricky, so smoothing out the process when possible can help make the changes easier for users without requiring a deep understanding of the details.

Aug 02 '24 13:08 jbardin

terraform terraform copied to clipboard

Replaced resource does not create when old dependency fails to destroy

Terraform Version

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Additional Context

References

terraform
terraform copied to clipboard