tfmigrate icon indicating copy to clipboard operation
tfmigrate copied to clipboard

"compute a new state" takes a very long time

Open jbg opened this issue 2 years ago • 1 comments

Planning a state migration with ~150 operations (about 50% import and 50% rm) takes a very long time (~2 hours in a 4 vCPU / 16 GiB container). A bash script running the same operations takes about 25 minutes, and that's a very naive script which locks and unlocks remote state for every operation! I would expect tfmigrate to be much faster since it works on a local copy of the state.

I set TF_CLI_ARGS_plan='-refresh=false -parallelism=32' but it didn't seem to help.

Here's the log of a run which ended up failing after about 2 hours due to the provider documentation about the import ID being wrong. 🤦🏻

exit status 1: running "cd $(git rev-parse --show-toplevel) && TF_CLI_ARGS_plan='-refresh=false -parallelism=32' tfmigrate plan && touch $PLANFILE" in "/var/lib/atlantis/repos/REDACTED/default/tfmigrate": 
2022/03/01 01:31:19 [INFO] Attempting to use session-derived credentials
2022/03/01 01:31:19 [INFO] Successfully derived credentials from session
2022/03/01 01:31:19 [INFO] AWS Auth provider used: "CredentialsEndpointProvider"
2022/03/01 01:31:20 [INFO] [runner] unapplied migration files: [redacted.hcl]
2022/03/01 01:31:20 [INFO] [runner] load migration file: tfmigrate/redacted.hcl
2022/03/01 01:31:20 [INFO] [migrator] start state migrator plan
2022/03/01 01:31:20 [INFO] [migrator@.] terraform version: 1.1.6
2022/03/01 01:31:20 [INFO] [migrator@.] initialize work dir
2022/03/01 01:32:50 [INFO] [migrator@.] get the current remote state
2022/03/01 01:33:12 [INFO] [migrator@.] override backend to local
2022/03/01 01:33:12 [INFO] [executor@.] create an override file
2022/03/01 01:33:12 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default
2022/03/01 01:33:12 [INFO] [executor@.] switch backend to local
2022/03/01 01:33:18 [INFO] [migrator@.] compute a new state
2022/03/01 03:31:53 [INFO] [migrator@.] check diffs
2022/03/01 03:35:13 [INFO] [executor@.] remove the override file
2022/03/01 03:35:13 [INFO] [executor@.] remove the workspace state folder
2022/03/01 03:35:13 [INFO] [executor@.] switch back to remote

Timestamps of the slow part emphasised:

2022/03/01 01:33:18 [INFO] [migrator@.] compute a new state 2022/03/01 03:31:53 [INFO] [migrator@.] check diffs

I am currently running a plan with TFMIGRATE_LOG=DEBUG and will update the ticket when it completes in a few hours.

jbg avatar Mar 01 '22 04:03 jbg

Hi @jbg, thank you for reporting this.

I wasn't aware of a performance issue because the most of typical my use cases have less than 10 operations. At the same time, I'm also aware of breaking changes in AWS provider v4 and I expect hundreds of imports to be required for me in the near future. When I run into the performance issue too, I'll investigate further.

For those who are already facing performance issues, sharing your benchmark will help with debugging.

minamijoyo avatar Mar 10 '22 01:03 minamijoyo