terraform-cdk
                                
                                 terraform-cdk copied to clipboard
                                
                                    terraform-cdk copied to clipboard
                            
                            
                            
                        cdktf: Large `terraform.<stack>.tfstate` file is cleaned when `destroy` is cancelled
Expected Behavior
I am running cdktf destroy via a python using subprocess.run. The exact command is
import subprocess
subprocess.run(["cdktf", "destroy", "--auto-approve"], check=True)
When this is destroying instances, pressing ctrl +c should stop the destroy and leave the terraform.<stack>.tfstate file with an accurate list of state.
Actual Behavior
The terraform.<stack>.tfstate file is empty.
Steps to Reproduce
I did a fair bit of testing to try and narrow down what exactly is happening here and get it to the smallest repeatable unit. It seems like the python subprocess call is required for the state to be emptied permanently, however using a raw cdktf destroy I noticed that the state file is briefly emptied and then recreated. This seems dependent on the size of the state.
I suspect what is happening is when the subprocess is cancelled, there isn't enough time given for the cdktf process to recreate the state file. Normally the cdktf process would be given about 10 seconds before the kill is escalated.
- Start an instance with a lot of resources. For example many files.
- Run cdktf destroyvia python
import subprocess
subprocess.run(["cdktf", "destroy", "--auto-approve"], check=True)
- When the resources are being destroyed, press ctrl +c.
I can reliably reproduce this with a ~5MB state file.
Versions
$ cdktf debug
language: python
cdktf-cli: 0.18.0
node: v18.12.0
cdktf: 0.18.0
constructs: 10.2.70
jsii: 1.89.0
terraform: 1.6.4
arch: x64
os: linux 5.15.0-88-generic
python: Python 3.9.6
pip: pip 21.1.3 from /home/michael/.pyenv/versions/3.9.6/envs/locus3.9/lib/python3.9/site-packages/pip (python 3.9)
pipenv: null
Providers
┌───────────────┬──────────────────┬─────────┬────────────┬─────────────────────────────┬─────────────────┐
│ Provider Name │ Provider Version │ CDKTF   │ Constraint │ Package Name                │ Package Version │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ aws           │ 5.19.0           │ ^0.18.0 │            │ cdktf-cdktf-provider-aws    │ 17.0.8          │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ random        │ 3.5.1            │ ^0.18.0 │            │ cdktf-cdktf-provider-random │ 9.0.0           │
├───────────────┼──────────────────┼─────────┼────────────┼─────────────────────────────┼─────────────────┤
│ tls           │ 4.0.4            │ ^0.18.0 │            │ cdktf-cdktf-provider-tls    │ 8.0.0           │
└───────────────┴──────────────────┴─────────┴────────────┴─────────────────────────────┴─────────────────┘
Gist
No response
Possible Solutions
It might be possible to write the state to a temporary file and then swap instead of emptying the existing state and rewriting it.
Workarounds
No response
Anything Else?
No response
References
No response
Help Wanted
- [ ] I'm interested in contributing a fix myself
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Hey, could you try to run this outside of python through CDKTF CLI and see if the problem persists? We are using the AbortController API under the hood and are forwarding the abort signal to the Terraform CLI. If you abort on CDKTF CLI directly I would assume it takes a few seconds for the abort inside Terraform CLI to be processed. If the python package aborts hard directly or after a certain grace period I think a possibly corrupted state file might be a possibility.
Hey. Running the cdktf directly doesn't permanently remove the state however it is still temporarily in a corrupted state. Do you know why the state file is cleaned at all during the process? I had a look through the code but couldn't find where that happens. Is the state file managed by terraform directly or by the CDK?
Given that the updating the state is not being done atomically, it seems likely that other workflows outside the specific python case could also corrupt the state. I'm happy to look into a fix if you can point me in the right direction.
The state file is managed by terraform directly, I think it's most likely somewhere in terraform.
It looks like there is already a TODO comment in terraform for this issue https://github.com/hashicorp/terraform/blob/1f9734619f953ecc7252d7b98a6d40d751b4ea1e/internal/states/statemgr/filesystem.go#L124C1-L129C18