terragrunt
terragrunt copied to clipboard
Cache terraform configurations used to fetch dependency outputs
I noticed one of my configurations being very slow to actually start running terraform apply
. After inspecting terragrunt's debug output, I traced it back to a call to terraform init -get=false
of a dependency configuration taking up most of the init time. I noticed on my system activity monitor that this command was pulling down quite a few MiBs from the internet (assumingly terraform backend plugins?) and this was slowing down everything.
I then looked for why terragrunt was doing this and discovered this recent discussion - https://github.com/gruntwork-io/terragrunt/issues/2119 introduced new behavior to optimize fetching the state of dependency blocks by bypassing terraform alltogether and fetching directly from an S3 bucket.
However, I'm using GCS backend and that optimization isn't implemented for GCS yet. The issue also discusses the challenges of that optimization and the robust fallback behavior.
Now, based on this comment:
But this cache only works for the current process, so it is only used when you do a
run-all
.
Originally posted by @yorinasub17 in https://github.com/gruntwork-io/terragrunt/issues/2119#issuecomment-1135291831
I wonder if not a "naive" but more robust implementation could (persistently) cache the terraform configuration that's used to get the terraform output
of the dependency configuration? This way terragrunt would still use terraform output
to fetch the dependency but can avoid expensive terraform init
calls on that dependency configuration.
This optimization could easily shave of 10s and up to 60s of "init" time from some of my more complex terragrunt configurations and yield a huge boost for interactive development.
We are experiencing a similar "lag" when running plans on modules with 4-5 dependency blocks, and we're using AWS s3.
More specifically, planning with mock_ouptuts
and skip_outputs
flag on each dependency block results in a 26s plan, while not using these leads to a 1min40secs plan.
That may not sound a lot but often you're trying to iterate on the main module (not on the dependencies) and this slowness builds up.
To mitigate this I had in mind a similar suggestion to @JohannesRudolph
More specifically, it would be great if there was an optional TG CLI flag for caching dependency outputs so they're available for subsequent plans. In this way plan speeds could be improved significantly instead of constantly re-calculating ouputs that may never change (and even if they did we wouldn't care) while testing changes in the main module.
I have the exact same issue, terragrunt is taking a minute to run and iterating on it is frustrating 😢
@fenos maybe add a 👍 on this issue descr
We are experiencing a similar "lag" when running plans on modules with 4-5 dependency blocks, and we're using AWS s3. More specifically, planning with
mock_ouptuts
andskip_outputs
flag on each dependency block results in a 26s plan, while not using these leads to a 1min40secs plan. That may not sound a lot but often you're trying to iterate on the main module (not on the dependencies) and this slowness builds up.To mitigate this I had in mind a similar suggestion to @JohannesRudolph
More specifically, it would be great if there was an optional TG CLI flag for caching dependency outputs so they're available for subsequent plans. In this way plan speeds could be improved significantly instead of constantly re-calculating ouputs that may never change (and even if they did we wouldn't care) while testing changes in the main module.
@parmouraly have you tried the new CLI flag? https://terragrunt.gruntwork.io/docs/reference/cli-options/#terragrunt-fetch-dependency-output-from-state
Since this one was added, we are using it to speed up all of our commands without any issues. It is currently only for AWS S3 but, if this is your backend, it should work.
And, BTW, as originally discussed in the issue #2119 Terragrunt has cache for dependencies included in the same run-all
, I think that caching the output locally could be risky since the dependency output could change by someone else between runs, especially when knowing that fetching the dependency output from the state directly only takes a few milliseconds.
@JohannesRudolph I think that it might be better to expand this feature to GCS instead.
@Ido-DY Could you please clarify whether you're suggesting you suspect this new terragrunt-fetch-dependency-output-from-state
flag is related to OP's original issue, or if it is a new non-causal feature that you think could resolve the slow Terragrunt commands from a different angle?
@stevenpitts If I got that right, the issue is all about the slowness of Terragrunt mostly when handling dependencies. I actually investigated it myself before opening #2119, and I found out that it actually terraform init
and terraform output -json
that takes 99% of the time, then I compared the output of the Terraform command with the output stored within the state file and confirmed it's identical. At this point I wonder why it takes Terraform more than 10 seconds to generate the exact output that can be fetched from the state directly with a few milliseconds, and I'm still not sure.
Caching the Terragrunt configuration is constructed from a few components:
- The wrapping configuration (such as the backend and credentials preparations)
- The Terraform plugins
- The output of packages
The 1st component is generated super fast and there is no point to my mind to cache it.
The seconed one can be cached using TF_CACHE_DIR
but it'll lead to other issues when having concurrent executions.
And the last one is cached by default only within the memory of the same run. Caching the last layer locally without knowing if there was another execution on the same resource is not safe.
It takes a few seconds for Terragrunt to run on a single component and most of the time is spent on the Terraform commands themselves, however when resolving dependencies throughout Terraform commands this time usually multiplied.
Is this answer your question?
@parmouraly have you find this flag useful for your case?
@stevenpitts If I got that right, the issue is all about the slowness of Terragrunt mostly when handling dependencies. I actually investigated it myself before opening #2119, and I found out that it actually
terraform init
andterraform output -json
that takes 99% of the time, then I compared the output of the Terraform command with the output stored within the state file and confirmed it's identical. At this point I wonder why it takes Terraform more than 10 seconds to generate the exact output that can be fetched from the state directly with a few milliseconds, and I'm still not sure. Caching the Terragrunt configuration is constructed from a few components:
- The wrapping configuration (such as the backend and credentials preparations)
- The Terraform plugins
- The output of packages
The 1st component is generated super fast and there is no point to my mind to cache it. The seconed one can be cached using
TF_CACHE_DIR
but it'll lead to other issues when having concurrent executions.And the last one is cached by default only within the memory of the same run. Caching the last layer locally without knowing if there was another execution on the same resource is not safe.
It takes a few seconds for Terragrunt to run on a single component and most of the time is spent on the Terraform commands themselves, however when resolving dependencies throughout Terraform commands this time usually multiplied.
Is this answer your question?
I think this makes sense, thank you! It's felt like the time it takes to "start up" Terragrunt (number 3 in your post) had increased recently, which was the reason I originally searched for this issue, but that may just be my imagination or something unrelated.
But doesn't it make sense that generating output of all dependent modules would take longer than fetching the output from state, since it involves many provider API calls, rather than a single call to the authoritative state file (local or remote)? I get the sense that this is a different conversation, and I don't want to derail this issue, so I'll leave it as something I don't quite understand yet about the Terragrunt flow :grin:
Thanks for the clarification!
Thanks everyone who joined in on the discussion so far. I'll summarize my understanding of the issue and will try to clarify my original enhancement suggestion.
The key problem is that when terragrunt
encounters a dependency
it needs to resolve the output of the dependency terraform configuration. The correct approach is to perform a terraform init -get=false
(we don't need modules, only providers) followed by terraform output -json
on the dependency terraform configuration.
Since this is slow, various forms of caching have been proposed here. Any approach needs to maintain correctness though. The solutions and their respective limitations are
- Bypassing terraform alltogether and fetching state directly from the state backend via
terragrunt-fetch-dependency-output-from-state
cli flag. Only supported for S3. - Speed up
terraform init
usingTF_CACHE_DIR
. This is a terraform feature, but is not safe with concurrent execution and can cause issues with lockfiles - Cache the initialized dependency terraform configuration inside the
.terragrunt-cache
dir, speeding upterraform init
because the configuration is already initialized (except on the first run). This could use the same terragrunt behavior already present for "auto init". - Cache the output of
terraform output -json
. This is already done inside a single execution ofterragrunt run-all
and helps speeding up executions with common shared dependency configurations.
@Ido-DY suggested extending 1) to GCS. While that's possible, my next feature request would be to extend that to Azure as well as we have some state backends there. I expect sooner or later there will be further requests to add first-party support in terragrunt for more state backends.
Another variant is extending 4) to persist the cache between different terragrunt run-all configurations and somehow maintain correctness.
My personal favorite would be 3) as this builds on terragrunt behavior that is already there i.e. maintaining the initialized terraform configuration inside a .terragrunt-cache
dir and auto-init. While this will still incur the cost of running terraform output -json
, it will save the terraform init
step which is super expensive without TF_CACHE_DIR
.