terraform icon indicating copy to clipboard operation
terraform copied to clipboard

GCS backend: Detect and force-unlock stale locks.

Open ubschmidt2 opened this issue 7 years ago • 11 comments

This change fixes the issue of orphaned lock files, which Terraform leaves behind in some scenarios. Such lock files had to be removed manually, either by means of a force-unlock or by deleting the file in GCS.

The "updated" timestamp of the lock file is used as an indicator of staleness. This timestamp is updated once per minute as long as the lock is held. A lock file is considered stale and is force-unlocked if its timestamp hasn't been updated for several minutes.

ubschmidt2 avatar Mar 01 '18 00:03 ubschmidt2

@octo @danawillow

ubschmidt2 avatar Mar 01 '18 00:03 ubschmidt2

Thanks for the PR @ubschmidt2!

This seems like a reasonable feature to me, but it does require a little more care around the lock management than in the "fail locked" case that we have now, and for some cases the architecture of terraform itself prevents us from handling it very well at all.

The main situation to consider, is that now that we have locks available, users are putting this into automation and blocking sometimes concurrent operations using the lock. If the client holding the lock loses connectivity temporarily, this can cause a waiting instance to possible incorrectly obtain the lock (granted, losing connectivity for over 4min, then regaining it and completing the run would be a very rare case).

Optimally terraform core could detect this and abort from the unsafe situation, but that isn't really possible with the current architecture. The consul backend (in which the lock always requires active connectivity) does 2 things to try and protect the user however -- it knows when the lock was lost and can report an error to the user, and it uses a CAS operation to ensure that the final state being overwritten is the the correct version.

These things are tradeoff's of course, as if there are two instances applying at the same time, there is technically no "correct" state to write at the end, but I think at least reporting the situation to the user will alert them that there could be inconsistencies.

So, after that background from my adventures in state locking, I'll add the rest of the comments inline in the review.

jbardin avatar Mar 06 '18 00:03 jbardin

Thanks, James, your review comments are very helpful. I'm working on the suggested improvements.

ubschmidt2 avatar Mar 07 '18 23:03 ubschmidt2

I've updated the pull request.

ubschmidt2 avatar Jun 20 '18 15:06 ubschmidt2

FWIW, the current Travis failure is unrelated to this PR.

# github.com/hashicorp/terraform/command
command/output_test.go:253:15: unknown field 'ContextOpts' in struct literal of type Meta
command/output_test.go:281:15: unknown field 'ContextOpts' in struct literal of type Meta

ubschmidt2 avatar Jun 21 '18 09:06 ubschmidt2

PTAL, if you have a chance.

ubschmidt2 avatar Sep 11 '18 12:09 ubschmidt2

PTAL, if you have a chance.

ubschmidt2 avatar Oct 29 '18 12:10 ubschmidt2

Hi @ubschmidt2,

Sorry about the long delay here, but we're nearing completion on the long 0.12 road and I'm going over PRs that were blocked by that work.

Can you rebase on master and re-run the tests?

jbardin avatar Feb 21 '19 21:02 jbardin

Ah, sorry, James, missed your comment. Yes, I'll do that. Also, still busy here with adoption of 0.12. ;-)

ubschmidt2 avatar Jun 26 '19 17:06 ubschmidt2

CLA assistant check
All committers have signed the CLA.

hashicorp-cla avatar Mar 12 '22 17:03 hashicorp-cla

Do we know if this is still an issue with the current backend? I have added it to the list of PRs for the GCS team to triage. Thanks!

crw avatar Aug 28 '24 17:08 crw

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Sep 20 '25 02:09 github-actions[bot]