terraform icon indicating copy to clipboard operation
terraform copied to clipboard

0.13.0 upgrade from 0.12.28 now seems to require some state locking parameters I can't find any information on.

Open jwshive opened this issue 5 years ago • 13 comments

Terraform Version

Terraform v0.13.0

Terraform Configuration Files

I am running everything through azure devops pipelines and doing replacetokens.

terraform {
  backend "#{cloud-provider}#" {
    resource_group_name = "#{agency}#-#{department}#-#{environment}#-tfrg"
    storage_account_name = "#{agency}##{department}##{application}##{environment}#tfsa"
    container_name = "#{application}#-terraform"
    key = "#{tfstatestoragekey}#"

    }
}

Debug Output

Crash Output

Expected Behavior

terraform plan should have run without issue.

Actual Behavior

2020-08-12T12:21:13.1149001Z [command]/opt/hostedtoolcache/terraform/0.13.0/x64/terraform plan
2020-08-12T12:21:14.5272154Z [31m
2020-08-12T12:21:14.5276813Z [1m[31mError: [0m[0m[1mError locking state: Error acquiring the state lock: 2 errors occurred:
2020-08-12T12:21:14.5277758Z 	* state blob is already locked
2020-08-12T12:21:14.5278392Z 	* blob metadata "terraformlockid" was empty
2020-08-12T12:21:14.5278780Z 
2020-08-12T12:21:14.5278995Z 
2020-08-12T12:21:14.5279172Z 
2020-08-12T12:21:14.5279579Z Terraform acquires a state lock to protect the state from being written
2020-08-12T12:21:14.5280226Z by multiple users at the same time. Please resolve the issue above and try
2020-08-12T12:21:14.5281317Z again. For most commands, you can disable locking with the "-lock=false"
2020-08-12T12:21:14.5282137Z flag, but this is not recommended.[0m
2020-08-12T12:21:14.5282432Z 
2020-08-12T12:21:14.5282906Z [0m[0m[0m
2020-08-12T12:21:14.5420354Z ##[error]Error: The process '/opt/hostedtoolcache/terraform/0.13.0/x64/terraform' failed with exit code 1
2020-08-12T12:21:14.5440293Z ##[section]Finishing: Terraform Plan

Steps to Reproduce

  1. terraform init
  2. terraform validate
  3. terraform plan

Additional Context

This runs via an Azure DevOps pipeline. I see many links talking about state locking if your backend supports it. I don't see any document telling me how to implement some sort of fix for this in my terraform code or pipeline. Am I do manually break the lease everytime I run code? That seems like more work than it should be. This same code ran yesterday on 0.12.28 and runs again when I change the version back to 0.12.28.

References

jwshive avatar Aug 12 '20 12:08 jwshive

I assume you are using Azure Storage Account here. Not that this will help with what caused the lock, but you can force the existing lock to be released with the following command:

az storage blob lease break -b FILE_NAME -c CONTAINER_NAME --account-name STORAGEACCOUNT_NAME --account-key ACCESS_KEY

brenak avatar Aug 12 '20 15:08 brenak

Thanks for the reply, I figured this would be the easiest solution.

I knew you could do it with another command execution, but I guess my bigger question is why is it so different between 0.12 and 0.13 and where in the TF files could I do this vs changing all my pipelines to add an additional step.

jwshive avatar Aug 12 '20 15:08 jwshive

I don't have good answers on that. You shouldn't have to do this every time. I've only ever encountered this locking issue if terraform was in the middle of updating the state, and it somehow lost connection or my system crashed leaving it locked. Its really rare that this happens.

brenak avatar Aug 12 '20 15:08 brenak

You should not need to deal with locking each time. The point of the lock is to prevent two terraform runs from happening at once with the same state. Are you able to reproduce this outside of the azure pipeline, on a local workstation?

danieldreier avatar Aug 12 '20 18:08 danieldreier

Thanks for your question. I did try this outside of azure pipelines and receive the same error. I've created my own quick TF file to test and I can reproduce my results.

This is the debug output I get from 0.13.0

2020/08/12 17:20:24 [DEBUG] Azure Backend Response for https://storageaccountname.blob.core.windows.net/impact-terraform/terraform.tfstate:
HTTP/1.1 200 OK
Content-Length: 43976
Accept-Ranges: bytes
Content-Type: application/json

Date: Wed, 12 Aug 2020 21:20:23 GMT
Etag: "0x8D83EDF202DA166"
Last-Modified: Wed, 12 Aug 2020 16:45:31 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
X-Ms-Access-Tier: Hot
X-Ms-Access-Tier-Inferred: true
X-Ms-Blob-Type: BlockBlob
X-Ms-Creation-Time: Mon, 10 Aug 2020 17:25:27 GMT
X-Ms-Lease-State: available
X-Ms-Lease-Status: locked
X-Ms-Request-Id: d918551a-a01e-00dc-6fee-70e661000000
X-Ms-Server-Encrypted: true
X-Ms-Version: 2018-11-09


Error: Error locking state: Error acquiring the state lock: 2 errors occurred:
        * state blob is already locked
        * blob metadata "terraformlockid" was empty



Terraform acquires a state lock to protect the state from being written
by multiple users at the same time. Please resolve the issue above and try
again. For most commands, you can disable locking with the "-lock=false"
flag, but this is not recommended.

But when I run the same code with my terraform0.12.29 binary, it blows right past all that and starts with the actual plan. I see where it says in the output

X-Ms-Lease-State: available X-Ms-Lease-Status: locked

and I figure that must be what terraform is now reading, but this works every time with the previous version.

Interestingly enough, if I do not use my working remote backend in azure and instead create a brand new remote backend from scratch, this works without issue. I could have missed, but I didn't see any instructions on patching the remote backends for an upgrade.

When I try to use the command above and break the lease I get an error, there is currently no lease on the blob.

What I have found now is that when I create a storage account WITHOUT Hierarchical namespace, the status of the blob once the write is finished is available and unlocked, when I create the storage account WITH Hierarchical namespace, the default state seems to be locked and available. The first run in a new state file always works, but all the jobs after that fail. Seems to be an issue with how the Hierarchical namespace works with storage accounts and lease states.


jwshive avatar Aug 12 '20 21:08 jwshive

Do you have a way to check whether this happens exclusively with the azure state backend, or are you also seeing this with any other state backend? The AzureRM provider team maintains the state backend, and so I'm trying to triage which team needs to troubleshoot this. If it's common to all backends, it's a core issue, and if it's specific to that backend I'll send it to the azure team.

danieldreier avatar Aug 12 '20 22:08 danieldreier

We only use azure here so I don't have anything easy to test with aws. I don't know that AWS has a hierarchical namespace, that's just my unfamiliarity with their service.

Looking at some of my other storage accounts, I see blobs in there that are unlocked and available. It seems to just be something happening with my terraform state file where it's available but remains locked.

jwshive avatar Aug 12 '20 22:08 jwshive

I'm really glad I found this page. I've been having the same issue all week. I had thought it was because of the unique way this particular environment was setup. So i copied out the code onto my own machine, which was running 0.12.24 at the time, it worked fine. I then upgraded to 0.13.0, ran a TF init, which was fine, and then ran a plan, and got this exact issue. As with other people in this post, I viewed the least state in Azure, it was 'Available'. So i manually leased the blob, then released it. It was then in a 'Broken' state. If I then run plan/apply/destroy, it works without issue. There is definitely some issue between the new TF 0.13.0 binary and the Azure storage account. I'm really hopeful this gets fixed very quickly

ajlancaster avatar Aug 14 '20 08:08 ajlancaster

I am just poking around commits for 0.13.0 and ran across this one. I’m not 100% sure what it’s trying to do but it involves local and remote state unlocking.

https://github.com/hashicorp/terraform/commit/86e9ba3d659176cd7ea969434e37cb064f23bb43

jwshive avatar Aug 14 '20 09:08 jwshive

Also getting this with azurerm backend. It's a blocker right now for us to upgrade. Breaking the lease manually seems to help briefly but the issue recurs.

n2qz avatar Aug 17 '20 23:08 n2qz

Same problem here after importing local state to an azure storage account

gdubya avatar Oct 09 '20 14:10 gdubya

What I have found now is that when I create a storage account WITHOUT Hierarchical namespace, the status of the blob once the write is finished is available and unlocked, when I create the storage account WITH Hierarchical namespace, the default state seems to be locked and available. The first run in a new state file always works, but all the jobs after that fail. Seems to be an issue with how the Hierarchical namespace works with storage accounts and lease states.

Thank you for this tip. I was able to move past this here, by creating a new storage account with hierarchical namespace disabled, and migrating state files to it before upgrading to 0.13.

n2qz avatar Nov 05 '20 13:11 n2qz

Also seeing this issue on Terraform v1.9.5 w/ azurerm v3.0.2. I manually created a new storage account/container via the Azure CLI, configured the back-end locally, and it never releases the lock after an operation completes.

For example:

  • I had to manually break the lease on the tfstate file to be able to run terraform apply
  • I run terraform apply and it's able to run the plan stage, spits out the output - it seems to lock the file at this stage and doesn't unlock it after
  • I type 'yes' to apply the changes, then I get the following error - I can break the lease again to make progress, but it seems to lock the file after ever single action and never unlocks it after.

Error: Error acquiring the state lock │ │ Error message: state blob is already locked │ Lock Info: │ ID: e9eb9ed7-a3bc-ae67-640d-ad4cb06fae7a │ Path: tfstate/terraform-dev.tfstate │ Operation: OperationTypePlan │ Who: MYCOMPUTER\MYNAME@MYCOMPUTER │ Version: 1.9.5 │ Created: 2024-08-26 22:42:07.991415 +0000 UTC │ Info: │ │ │ Terraform acquires a state lock to protect the state from being written │ by multiple users at the same time. Please resolve the issue above and try │ again. For most commands, you can disable locking with the "-lock=false" │ flag, but this is not recommended.

ChuckkNorris avatar Aug 26 '24 22:08 ChuckkNorris