terraformize
terraformize copied to clipboard
concurrency issue
Expected/Wanted Behavior
As a user I expect to run a single module in parallel, but with different workspaces.
Actual Behavior
There is a race condition when doing multiple curl calls to a single module but different workspaces. If workspaceA (wA) performs a workspace select and then a 2nd API is triggered for the same module but for workspaceB (wB). We can run into a race condition where the select for wB overwrites the previous workspace select performed for wA. This will cause wA to write into the state file for wB during its apply.
Steps to Reproduce the Problem
Run two curl calls in the background
curl -X POST \
http://127.0.0.1/v1/test-module/client1 \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{}' | jq & \
curl -X POST \
http://127.0.0.1/v1/test-module/client2 \
-H 'Content-Type: application/json' \
-H 'cache-control: no-cache' \
-d '{}' | jq &
Do this enough times and you will find the state file becomes incorrect.
Also, you can see here when switching workspaces, this file is updated with the workspace name and is used by subsequent terraform commands.
$ cat /www/terraform_modules/test-module/.terraform/environment
client1
Specifications
docker terraformize version: v151
Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.97. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
Three solutions I have come up with so far:
- Set a concurrency max of 1 for a container (if using cloud run for example)
- We copy the requested module into a temporary directory to work out of for every API call - making sure to clean up after the call is complete.
- There is a
TF_WORKSPACEenv variable we can set which prevents writing the local environment file on disk. But this can only be set after aninitand removes the stepsworkspace createandworkspace select- terraform will auto-create a missing workspace when using this env variable. From my testings right now, this seems to be the best option.
There is still an issue with concurrency and init on the same module, I have been skirting around this issue by passing lock=False to the init method - though I am unsure on how safe this is atm.
https://www.terraform.io/docs/commands/init.html#copy-a-source-module
so far the best option has been to make a tmp directory and run terraform init -from-module=/www/terraform_modules/<module>. this will copy the module into the tmp directory and allow every API call to have its own scratch pad.
there doesn't seem to be a way to prevent terraform from creating a .terraform/terraform.tfstate in the module directory.
A temp directory won't work as it will mean that local state storage isn't possible to use, however this does seem like a problem that only happens if your using remote executors and\or are using multiple gunicorn workers\threads to parallelize requests as otherwise the 2nd request won't be processed until the first completed fully (& as a result there won't be a race condition), if that's the case another option is just to scale the service out to handle load & using option 1 of the solution you proposed until we figure out a way to make it thread safe that also preserves the ability of using local folder state for whoever desires to
currently i am using the example docker container and config with worker 1 and thread 1 - it does not appear to execute sequentially.
resolving local state issue:
- create env var boolean for remote state usage
- if remote_state == true create tmp directory
- if remote_state == true garbage collect (or leave this up to the user to do)
Issue seemed resolved with the gunicorn sync worker as I tried replicated your test & couldn't replicate the issue with it