Resource state not refreshed before destroy
We maintain a terraform provider that is failing to delete a "primary IP" resource in certain circumstances.
Before we can delete a primary IP resource, we have to unassign it from a server instance in our API.
During our tests of the delete call, we noticed that the state of the primary IP resource references an invalid server instance. This led to try to unassign the primary IP from an instance that was not assigned. Which made the primary IP resource deletion fail.
That bug is only happening in terraform >=1.9, so we ran a git bisect from the last known terraform version to work (1.8.5). We found the problematic commit to be 460c7f3933115c3edf670caacd2ffa489ef4eeb8 https://github.com/hashicorp/terraform/pull/35467
Reverting that commit on top of the v1.9 branch fixes our issue.
Terraform Version
Terraform v1.9.4
on linux_amd64
Terraform Configuration Files
On https://github.com/hetznercloud/terraform-provider-hcloud/tree/tf-1.9-primary-ip-delete
Debug Output
TF_ACC=1 go test ./internal/server -run TestServerResource_PrimaryIPTests -v -timeout=30m -parallel=8
=== RUN TestServerResource_PrimaryIPTests
resource_test.go:908:
HCL:
1: resource "hcloud_ssh_key" "server" {
2: name = "server--665592047586110587"
3: public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
4: labels = {
5: key = "4999487623275307476"
6: }
7: }
8:
9: resource "hcloud_primary_ip" "primary-ip-v4" {
10: name = "primaryip-v4-test--665592047586110587"
11: type = "ipv4"
12: datacenter = "nbg1-dc3"
13:
14: assignee_type = "server"
15:
16: auto_delete = false
17: }
18:
19: resource "hcloud_server" "server-primaryIP-test" {
20: name = "server-primaryIP-test--665592047586110587"
21: server_type = "cpx11"
22: image = "ubuntu-24.04"
23: datacenter = "nbg1-dc3"
24:
25: ssh_keys = [hcloud_ssh_key.server.id]
26:
27: public_net {
28: ipv4 = hcloud_primary_ip.primary-ip-v4.id
29: ipv4_enabled = true
30: ipv6_enabled = false
31: }
32:
33:
34:
35: }
resource_test.go:927:
HCL:
1: resource "hcloud_ssh_key" "server" {
2: name = "server--665592047586110587"
3: public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
4: labels = {
5: key = "4999487623275307476"
6: }
7: }
8:
9: resource "hcloud_primary_ip" "primary-ip-v4" {
10: name = "primaryip-v4-test--665592047586110587"
11: type = "ipv4"
12: datacenter = "nbg1-dc3"
13:
14: assignee_type = "server"
15:
16: auto_delete = false
17: }
18:
19: resource "hcloud_server" "server-primaryIP-test" {
20: name = "server-primaryIP-test--665592047586110587"
21: server_type = "cpx11"
22: image = "ubuntu-24.04"
23: datacenter = "nbg1-dc3"
24:
25: ssh_keys = [hcloud_ssh_key.server.id]
26:
27:
28:
29: }
resource_test.go:946:
HCL:
1: resource "hcloud_ssh_key" "server" {
2: name = "server--665592047586110587"
3: public_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCs9urdjWxJqCVELlPhuQBrCsTJ0XdF3j2+VETM59knH2crDtaAK2omx1cX5MFwjHKEea9fZEel/w0Vj4NhfgsGMe7JSr4Cj5bubcVGI7rJ12Ohl3QFLOZ6azwd13gT6K6o2g3OZtGQagxso4u4BOp9KyLy6wcxO4DVDhyt2Le38w== hcloud@ssh-acceptance-test"
4: labels = {
5: key = "4999487623275307476"
6: }
7: }
8:
9: resource "hcloud_primary_ip" "primary-ip-v6" {
10: name = "primaryip-v6-test--665592047586110587"
11: type = "ipv6"
12: datacenter = "nbg1-dc3"
13:
14: assignee_type = "server"
15:
16: auto_delete = false
17: }
18:
19: resource "hcloud_server" "server-primaryIP-test" {
20: name = "server-primaryIP-test--665592047586110587"
21: server_type = "cpx11"
22: image = "ubuntu-24.04"
23: datacenter = "nbg1-dc3"
24:
25: ssh_keys = [hcloud_ssh_key.server.id]
26:
27: public_net {
28: ipv4_enabled = false
29: ipv6 = hcloud_primary_ip.primary-ip-v6.id
30: ipv6_enabled = true
31: }
32:
33:
34:
35: }
=== PAUSE TestServerResource_PrimaryIPTests
=== CONT TestServerResource_PrimaryIPTests
resource_test.go:901: Step 3/3 error: Error running apply: exit status 1
Error: unexpected assignee id: state_assignee_id=51802107 primary_ip_id=67207997 server_ipv4_id=67208025 server_ipv6_id=67208029
--- FAIL: TestServerResource_PrimaryIPTests (77.92s)
FAIL
FAIL github.com/hetznercloud/terraform-provider-hcloud/internal/server 77.934s
FAIL
make: *** [GNUmakefile:65: testacc] Error 1
Traces: https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/test-TestServerResource_PrimaryIPTests.log
Actual Behavior
The state of the primary IP resource references an invalid server instance. Which led to try to unassign the primary IP from an instance that was not assigned. Which made the primary IP resource deletion fail.
Expected Behavior
We expect the state of the primary IP resource to be refreshed before we can destroy it.
Steps to Reproduce
git clone --branch tf-1.9-primary-ip-delete https://github.com/hetznercloud/terraform-provider-hcloud
cd terraform-provider-hcloud
export TF_LOG="trace"
export TF_LOG_PATH_MASK="test-%s.log"
make TEST="./internal/server" TESTARGS="-run TestServerResource_PrimaryIPTests" testacc
Additional Context
References
- 460c7f3933115c3edf670caacd2ffa489ef4eeb8
- https://github.com/hashicorp/terraform/pull/35467
Hi @jooola,
Thanks for filing the issue. I have not been able to replicate the issue from your config yet, and unfortunately the logs don't have any of the core trace output either. I'm wondering if this is something specific to the acceptance tests themselves, can you run a standalone example from the command line? (that would also give easier access to the core logs, especially if you use TF_LOG_CORE=trace to avoid all the plugin logging).
Sadly, we are having some trouble to make this reproducible outside our provider.
We were able to reproduce the bug only running from the CLI.
I pushed the core trace logs and the backed up states for each steps in our debug branch. The failing step is the step 5 https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/step5/step5.log
The error is ocurring at this line: https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/step5/step5.log#L1300
The previous steps can be found in https://github.com/hetznercloud/terraform-provider-hcloud/blob/tf-1.9-primary-ip-delete/debug/
To re-state our problem, we are running a delete resource using a state that is not up-to-date. Previously, the state was refreshed using the read resource call before performing the delete resource call.
We are not sure if it is now our responsibility to pull fresh data ourselves in the delete call, or if terraform is the one responsible for refreshing the state before deleting the resource.
I think I see a similar issue.
On Terraform 1.9.5, I see this inconsistency using one of my providers.
I tried running a destroy plan in 2 ways:
- I ran
terraform destroy
root@lglap049:~/terraform-ome-qe/test# terraform destroy
ome_firmware_catalog.firmware_catalog_example1: Refreshing state... [name=rounak_1]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
# ome_firmware_catalog.firmware_catalog_example1 will be destroyed
- resource "ome_firmware_catalog" "firmware_catalog_example1" {
- associated_baselines = [] -> null
...
- share_password = (sensitive value) -> null
- share_type = "HTTPS" -> null
- share_user = "example-user" -> null
- source_path = "catalogs" -> null
- status = "Failed" -> null
- task_id = 11226 -> null
}
Plan: 0 to add, 0 to change, 1 to destroy.
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: no
Destroy cancelled.
root@lglap049:~/terraform-ome-qe/test#
- I commented the resource block and ran
terraform apply.
root@lglap049:~/terraform-ome-qe/test# terraform apply
ome_firmware_catalog.firmware_catalog_example1: Refreshing state... [name=rounak_1]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
# ome_firmware_catalog.firmware_catalog_example1 will be destroyed
# (because ome_firmware_catalog.firmware_catalog_example1 is not in configuration)
- resource "ome_firmware_catalog" "firmware_catalog_example1" {
- associated_baselines = [] -> null
...
- share_password = (sensitive value) -> null
- share_type = "HTTPS" -> null
- share_user = "example-user" -> null
- source_path = "catalogs" -> null
- status = "Running" -> null
- task_id = 11226 -> null
}
Plan: 0 to add, 0 to change, 1 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: no
Apply cancelled.
root@lglap049:~/terraform-ome-qe/test#
Now, the true state of the resource's status field is Failed, which is what we should see if refresh is happening correctly.
But when running terraform apply, u can see that the field is still the old value Running. This is surprising since the logs show that a refresh is happening.
Thank you @rounak-adhikary, that explains the difference! The changes for deferred actions were not done in the correct order for orphaned resource instances, so the refreshed state was not refreshed in the change.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.