terraform-cdk icon indicating copy to clipboard operation
terraform-cdk copied to clipboard

One stack failure when deploying multiple stack in parallel causes other stacks state to remain locked

Open fathom-parth opened this issue 2 years ago • 5 comments

Expected Behavior

The stacks that did not fail should continue and complete. Once this is complete the error message for the stack that failed should be printed at the end. Also no other new stacks then should run. At the end of a failure the state file should not be locked for any of the stacks.

Actual Behavior

Remote state file gets locked for the stacks that were running in parallel but did not fail. At the end of the debug logs linked below, I would've expected it to end with:

Step #2: 1 Stack deploying     4 Stacks done     0 Stacks waiting
--
Step #2: Invoking Terraform CLI failed with exit code 1

Steps to Reproduce

  1. Setup cdktf project with multiple stacks which can run in parallel
  2. Make sure one of the stack fails while other stacks are not complete
  3. Run the stacks in parallel
  4. Look at the state file of the stacks that did not fail

Versions

cdktf debug language: python cdktf-cli: 0.18.0 node: v18.17.1 cdktf: 0.18.0 constructs: 10.3.0 jsii: 1.91.0 terraform: 1.4.6 arch: x64 os: linux 5.15.49-linuxkit python: Python 3.7.16 pip: pip 21.3.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7) pipenv: null

Providers

┌───────────────┬──────────────────┬─────────┬────────────┬──────────────────────────────────┬─────────────────┐ │ Provider Name │ Provider Version │ CDKTF │ Constraint │ Package Name │ Package Version │ ├───────────────┼──────────────────┼─────────┼────────────┼──────────────────────────────────┼─────────────────┤ │ google │ 4.83.0 │ ^0.18.0 │ │ cdktf-cdktf-provider-google │ 9.0.4 │ ├───────────────┼──────────────────┼─────────┼────────────┼──────────────────────────────────┼─────────────────┤ │ google-beta │ 4.83.0 │ ^0.18.0 │ │ cdktf-cdktf-provider-google-beta │ 9.0.4 │ ├───────────────┼──────────────────┼─────────┼────────────┼──────────────────────────────────┼─────────────────┤ │ kubernetes │ 2.23.0 │ ^0.18.0 │ │ cdktf-cdktf-provider-kubernetes │ 9.0.0 │ └───────────────┴──────────────────┴─────────┴────────────┴──────────────────────────────────┴─────────────────┘

Gist

https://gist.github.com/fathom-parth/9c14c7e54028560c06d300a5484f0f7b

Possible Solutions

Not sure :(

Workarounds

None that we know of

Anything Else?

No response

References

This is supposedly fixed in https://github.com/hashicorp/terraform-cdk/issues/1836 but we're still seeing this issue (most of the above text for expected behavior and actual behavior is copied/pasted).

Help Wanted

  • [ ] I'm interested in contributing a fix myself

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

fathom-parth avatar Oct 27 '23 20:10 fathom-parth

I think what's happening is that we have cross-stack dependencies here that are running into this issue: https://github.com/hashicorp/terraform-cdk/issues/3023

which then causes all the stacks to fail because it's an unhandled exception and makes all the stacks locked

fathom-parth avatar Oct 27 '23 20:10 fathom-parth

Hey, would you be so kind to create a minimal example for this? The tests we have for this behaviour work just fine so we must be missing a detail here when building the reproduction case.

DanielMSchmidt avatar Nov 17 '23 12:11 DanielMSchmidt

Hi there! 👋 We haven't heard from you in 15 days and would like to know if the problem has been resolved or if you still need help. If we don't hear from you before then, I'll auto-close this issue in 30 days.

github-actions[bot] avatar Dec 03 '23 02:12 github-actions[bot]

Hi there! 👋 We haven't heard from you in 15 days and would like to know if the problem has been resolved or if you still need help. If we don't hear from you before then, I'll auto-close this issue in 30 days.

github-actions[bot] avatar Feb 03 '24 02:02 github-actions[bot]

Just wanted to add that our team experienced this issue as well

loozhengyuan avatar Feb 05 '24 03:02 loozhengyuan