terragrunt
terragrunt copied to clipboard
Terragrunt run-all init --upgrade is inconsistent regarding use of the shared-cache.
We have two use cases where we run the terragrunt init --upgrade
- we topo-sort our layers and run `terragrunt init --upgrade --terragrunt-working-dir ${LAYER} (and other commands)
- we run
terragrunt run-all init --ugprade
For both use cases, we preload the shared cache with the set of providers we've tested and are now deploying.
Use case 1 is rock solid, finds every module in the shared cache and completes successfully. (only one layer is ever in flight) Use case 2 is flaky. Snippets from the logs are blow
Initializing the backend...
Initializing modules... <REDACTED>
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Installing hashicorp/local v2.1.0...
Initializing provider plugins...
- Reusing previous version of grafana/grafana from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/dns from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/local from the dependency lock file
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/local v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Installed hashicorp/template v2.2.0 (unauthenticated)
- Installing hashicorp/tls v3.1.0...
- Installed hashicorp/tls v3.1.0 (unauthenticated)
- Installing grafana/grafana v1.13.4...
- Installed grafana/grafana v1.13.4 (unauthenticated)
- Installing hashicorp/random v3.1.0...
- Installed hashicorp/random v3.1.0 (unauthenticated)
- Installing hashicorp/null v3.1.0...
- Installing hashicorp/dns v3.2.1...
- Installed hashicorp/null v3.1.0 (unauthenticated)
- Installing hashicorp/external v2.1.0...
- Installed hashicorp/dns v3.2.1 (unauthenticated)
- Using grafana/grafana v1.13.4 from the shared cache directory
- Installed hashicorp/external v2.1.0 (unauthenticated)
- Installing hashicorp/aws v3.63.0...
- Using hashicorp/external v2.1.0 from the shared cache directory
- Using hashicorp/tls v3.1.0 from the shared cache directory
- Using hashicorp/null v3.1.0 from the shared cache directory
- Using hashicorp/random v3.1.0 from the shared cache directory
- Using hashicorp/template v2.2.0 from the shared cache directory
╷
│ Error: Failed to install provider
│
│ Error while installing hashicorp/aws v3.63.0: the current package for
│ registry.terraform.io/hashicorp/aws 3.63.0 doesn't match any of the
│ checksums previously recorded in the dependency lock file
╵
The AWS provider was downloaded VS used from the cache. This is just an example of the failure, the error is random with respect to which provider(s) are re-used vs downloaded.
Version info. Terragrunt v0.35.3 Terraform v1.0.9
Suffering from very similar problem
I believe this is a Terraform bug: it has race condition between two terraform init
trying to install the same provider same version. First it calls installFromHTTPURL
and it downloads to a temporary file with random name, but then it calls installFromLocalArchive
and this unpacks directly to global plugins cache directory - this is there race condition occurs.
As per this comment - this is expected behavior, so maybe terraform init
should not be done in parallel by terragrunt.
If a change is made to run init serially, it would be nice to put that behind a cli-arg instead of making it the default. We pre-populate the plugin_cache_dir when we run terragrunt, using a provider mirror, so the plugins are already present and parallel init calls do not step on each other.
This issue should already be solved. Can someone who has encountered this issue check this?
@levkohimins i'm still seeing this, as soon as I put back --terragrunt-parallelism 1
it's solved. It's killing me, we could have our init
run so much faster :(
╷
│ Error: Failed to install provider from shared cache
│
│ Error while importing hashicorp/google v5.9.0 from the shared cache
│ directory: the provider cache at .terraform/providers has a copy of
│ registry.terraform.io/hashicorp/google 5.9.0 that doesn't match any of the
│ checksums recorded in the dependency lock file.
╵
@davidgwps, The only way at the moment is to run two commands:
-
run-all init
will be automatically executed sequentially for all modules, just like--terragrunt-parallelism 1
- Any other command that can be executed in parallel.
We are working on the better solution #2920
Resolved in v0.56.4 release. Make sure to read Provider Caching.