terraform-cdk cdktf: Large `terraform.<stack>.tfstate` file is cleaned when `destroy` is cancelled

Expected Behavior

I am running cdktf destroy via a python using subprocess.run. The exact command is

import subprocess
subprocess.run(["cdktf", "destroy", "--auto-approve"], check=True)

When this is destroying instances, pressing ctrl +c should stop the destroy and leave the terraform.<stack>.tfstate file with an accurate list of state.

Actual Behavior

The terraform.<stack>.tfstate file is empty.

Steps to Reproduce

I did a fair bit of testing to try and narrow down what exactly is happening here and get it to the smallest repeatable unit. It seems like the python subprocess call is required for the state to be emptied permanently, however using a raw cdktf destroy I noticed that the state file is briefly emptied and then recreated. This seems dependent on the size of the state.

I suspect what is happening is when the subprocess is cancelled, there isn't enough time given for the cdktf process to recreate the state file. Normally the cdktf process would be given about 10 seconds before the kill is escalated.

Start an instance with a lot of resources. For example many files.
Run cdktf destroy via python

import subprocess
subprocess.run(["cdktf", "destroy", "--auto-approve"], check=True)

When the resources are being destroyed, press ctrl +c.

I can reliably reproduce this with a ~5MB state file.

Versions

$ cdktf debug
language: python
cdktf-cli: 0.18.0
node: v18.12.0
cdktf: 0.18.0
constructs: 10.2.70
jsii: 1.89.0
terraform: 1.6.4
arch: x64
os: linux 5.15.0-88-generic
python: Python 3.9.6
pip: pip 21.1.3 from /home/michael/.pyenv/versions/3.9.6/envs/locus3.9/lib/python3.9/site-packages/pip (python 3.9)
pipenv: null

Providers

┌───────────────┬──────────────────┬─────────┬────────────┬─────────────────────────────┬─────────────────┐
│ Provider Name │ Provider Version │ CDKTF   │ Constraint │ Package Name                │ Package Version │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ aws           │ 5.19.0           │ ^0.18.0 │            │ cdktf-cdktf-provider-aws    │ 17.0.8          │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ random        │ 3.5.1            │ ^0.18.0 │            │ cdktf-cdktf-provider-random │ 9.0.0           │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ tls           │ 4.0.4            │ ^0.18.0 │            │ cdktf-cdktf-provider-tls    │ 8.0.0           │
└───────────────┴──────────────────┴─────────┴────────────┴─────────────────────────────┴─────────────────┘

Gist

No response

Possible Solutions

It might be possible to write the state to a temporary file and then swap instead of emptying the existing state and rewriting it.

Workarounds

No response

Anything Else?

No response

References

No response

Help Wanted

[ ] I'm interested in contributing a fix myself

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Nov 16 '23 17:11 MJohnson459

Hey, could you try to run this outside of python through CDKTF CLI and see if the problem persists? We are using the AbortController API under the hood and are forwarding the abort signal to the Terraform CLI. If you abort on CDKTF CLI directly I would assume it takes a few seconds for the abort inside Terraform CLI to be processed. If the python package aborts hard directly or after a certain grace period I think a possibly corrupted state file might be a possibility.

Nov 17 '23 12:11 DanielMSchmidt

Hey. Running the cdktf directly doesn't permanently remove the state however it is still temporarily in a corrupted state. Do you know why the state file is cleaned at all during the process? I had a look through the code but couldn't find where that happens. Is the state file managed by terraform directly or by the CDK?

Given that the updating the state is not being done atomically, it seems likely that other workflows outside the specific python case could also corrupt the state. I'm happy to look into a fix if you can point me in the right direction.

Nov 17 '23 13:11 MJohnson459

The state file is managed by terraform directly, I think it's most likely somewhere in terraform.

Nov 17 '23 13:11 DanielMSchmidt

It looks like there is already a TODO comment in terraform for this issue https://github.com/hashicorp/terraform/blob/1f9734619f953ecc7252d7b98a6d40d751b4ea1e/internal/states/statemgr/filesystem.go#L124C1-L129C18

Nov 17 '23 13:11 MJohnson459

terraform-cdk terraform-cdk copied to clipboard

cdktf: Large `terraform.<stack>.tfstate` file is cleaned when `destroy` is cancelled

Expected Behavior

Actual Behavior

Steps to Reproduce

Versions

Providers

Gist

Possible Solutions

Workarounds

Anything Else?

References

Help Wanted

Community Note

terraform-cdk
terraform-cdk copied to clipboard