terraform icon indicating copy to clipboard operation
terraform copied to clipboard

Drift detection does not account for resource state upgrades

Open treyhendon opened this issue 2 years ago • 4 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v1.1.7
on linux_amd64
+ provider registry.terraform.io/hashicorp/azuread v2.19.1
+ provider registry.terraform.io/hashicorp/azurerm v3.0.1
+ provider registry.terraform.io/hashicorp/external v2.2.2
+ provider registry.terraform.io/hashicorp/random v3.1.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0

Affected Resource(s)

  • azurerm_kubernetes_cluster
  • azurerm_kubernetes_cluster_node_pool

Debug Output

Error: Failed to decode resource from state

Error decoding
"module.redsail_aks[\"eastus\"].azurerm_kubernetes_cluster_node_pool.node_pool[\"apppool\"]"
from previous state: unsupported attribute "availability_zones"

Expected Behaviour

When planning and executing, 3.x should read the previous 2.x created asset with availability_zones and then save the tfstate as zones upon apply.

Actual Behaviour

3.0.0 and 3.0.1 are not able to plan or apply for an existing asset because availablity_zones is unsupported.

Steps to Reproduce

  1. Create an AKS instance with 2.x provider in multiple zones.
  2. Attempt to run 3.x provider against that AKS instance.

treyhendon avatar Mar 25 '22 19:03 treyhendon

Hey @treyhendon, we're seeing this issue quite a bit today but I haven't been able to reproduce it yet. I followed your steps and was unable to reproduce your issue. Here were the steps I took

  1. Create an azurerm_kubernetes_cluster_node_pool with multiple availability_zones with Terraform 1.17 and Azure Provider 2.99.0
  2. Swap availability_zones with zones in the config file
  3. Run terraform plan with Azure 3.0.1

Everything worked ok and I didn't see any errors. Are you able to check that the steps you provided above can consistently reproduce this issue?

mbfrahry avatar Mar 25 '22 20:03 mbfrahry

Odd @mbfrahry wouldn't you know that I'm not getting that error consistently now either now that I fixed some casing issues for other assets ( #16076 ). I'm scratching my head now too.

Here's the output from our initial build (last week) of the environment that has been giving us soo much trouble (this week).

Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/azuread v2.19.1
+ provider registry.terraform.io/hashicorp/azurerm v2.99.0
+ provider registry.terraform.io/hashicorp/external v2.2.2
+ provider registry.terraform.io/hashicorp/random v3.1.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0

treyhendon avatar Mar 25 '22 21:03 treyhendon

Hi @treyhendon,

Would it be possible to reproduce this and capture the trace logs with TF_LOG_CORE=trace? My initial hunch that we missed the normal upgrade path when calculating drift detection doesn't seem to be the case, so I'm not sure exactly where this error is coming from.

Thanks!

jbardin avatar Mar 28 '22 19:03 jbardin

Sure, I'll give it a try. We were able to get our production stuff out the door after 3.0 updates, so I'll try one of our non-prod environments tomorrow starting at 2.x provider and then rerun. Fingers crossed I'll get the error again, lol.

treyhendon avatar Mar 28 '22 20:03 treyhendon

Just wanted to chime in and state that I'm receiving this problem lately, as mentioned in the other github issue #31052. I'm in the process of upgrading my azurerm provider and there are many attributes that have become deprecated/renamed. Because I use multi-region infrastructure, Terraform is often executed with the -target option.

What is frustrating is that I am not getting consistent behavior between environments. I use Azure's state storage for each environment, so they all have their own state file per environment. When upgrading Terraform and AzureRM on my CI, DEV, QA environment, I had no issues with deprecated fields. Once I got to my staging environment, which is a multi-region environment where I often use the -target option, I ran into this problem as I mentioned in this github issue with AzureRM.

I thought that maybe this was a problem related to using the -target option, which I don't use for CI, DEV, and QA. Even if I use -target against these environment, I still didn't get the error about decoding state. So it seemed like Terraform was tolerant of the attributes being deprecated, but not for my staging environment.

ajklotz avatar Mar 16 '23 17:03 ajklotz

@jbardin I'm at the point where I want to deploy to our production environment, and the only way around this is to edit the tfstate file directly, which is discouraged by documentation. Do you still need TF_LOG_CORE debug output to fix this issue?

The annoying problem is that there are many deprecated attributes and I have to run deployments over and over again (through an approval process mind you) just to find out what the next set of deprecated attributes need to be taken out of the tfstate file. I'm afraid my release pipeline approver will get upset with me lol

ajklotz avatar Mar 21 '23 15:03 ajklotz

@ajklotz, upgrading a provider with resources which require schema changes in conjunction with using -target is a known issue and tracked in #31052. This issue does not mention -target and is waiting for a reproduction of that case. If you have a situation where -target is not involved, the trace logs would definitely help.

Thanks!

jbardin avatar Mar 21 '23 15:03 jbardin

Thanks @jbardin I got turned around and forgot I had posted in #31052. You're right, issue is not quite what I'm running into. I'll post in #31052

ajklotz avatar Mar 21 '23 15:03 ajklotz

Since this head not been reproduced in the past year, I'm going to close it for now assuming it's a duplicate of 31052.

jbardin avatar Mar 21 '23 17:03 jbardin

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Apr 21 '23 02:04 github-actions[bot]