terragrunt
terragrunt copied to clipboard
Issues with *-all commands and terraform plugin cache directory
If I set TF_PLUGIN_CACHE_DIR
to any directory and use any terragrunt *-all
commands fail with Could not satisfy plugin requirements
errors. My repro case bellow:
terragrunt.hcl:
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "my-terraform-test-state-test"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-west-2"
encrypt = true
}
}
a/terragrunt.hcl:
terraform {
source = "../example-tf-module"
}
include {
path = find_in_parent_folders()
}
example-tf-module/main.tf
data "aws_region" "current" {}
output "aws_region" {
value = data.aws_region.current.name
}
then I create directories b
through m
and copy a/terragrunt.hcl
to them. My final directory tree is:
.
├── a
│ └── terragrunt.hcl
├── b
│ └── terragrunt.hcl
├── c
│ └── terragrunt.hcl
├── d
│ └── terragrunt.hcl
├── e
│ └── terragrunt.hcl
├── example-tf-module
│ └── main.tf
├── f
│ └── terragrunt.hcl
├── g
│ └── terragrunt.hcl
├── h
│ └── terragrunt.hcl
├── i
│ └── terragrunt.hcl
├── j
│ └── terragrunt.hcl
├── k
│ └── terragrunt.hcl
├── l
│ └── terragrunt.hcl
├── m
│ └── terragrunt.hcl
└── terragrunt.hcl
If cd to any of the one letter directories terragrunt validate
works fine. From the root dir, with both TF_LOG
and TG_LOG
set to debug, terragrunt validate-all
will fail some of the modules. TF/TG log and TF/TG putput from failed modules:
[terragrunt] [/Users/pietro/tmp/i] 2020/06/05 15:50:40 Running command: terraform validate
2020/06/05 15:50:40 [INFO] Terraform version: 0.12.26
2020/06/05 15:50:40 [INFO] Go runtime version: go1.13.11
2020/06/05 15:50:40 [INFO] CLI args: []string{"/usr/local/bin/terraform", "validate"}
2020/06/05 15:50:40 [DEBUG] Attempting to open CLI config file: /Users/pietro_monteiro/.terraformrc
2020/06/05 15:50:40 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2020/06/05 15:50:40 [INFO] CLI command args: []string{"validate"}
2020/06/05 15:50:40 [DEBUG] checking for provider in "."
2020/06/05 15:50:40 [DEBUG] checking for provider in "/usr/local/bin"
2020/06/05 15:50:40 [DEBUG] checking for provider in ".terraform/plugins/darwin_amd64"
2020/06/05 15:50:40 [DEBUG] found provider "terraform-provider-aws_v2.65.0_x4"
2020/06/05 15:50:40 [DEBUG] found valid plugin: "aws", "2.65.0", "/Users/pietro/tmp/i/.terragrunt-cache/6gc2TEE1zxjCqaeQgfJsqVbaD4E/DfnLP98YzbykP1vtM_BhaFk10FU/.terraform/plugins/darwin_amd64/terraform-provider-aws_v2.65.0_x4"
2020/06/05 15:50:40 [DEBUG] checking for provisioner in "."
2020/06/05 15:50:40 [DEBUG] checking for provisioner in "/usr/local/bin"
2020/06/05 15:50:40 [DEBUG] checking for provisioner in ".terraform/plugins/darwin_amd64"
2020/06/05 15:50:40 [TRACE] terraform.NewContext: starting
2020/06/05 15:50:40 [TRACE] terraform.NewContext: resolving provider version selections
Error: Could not satisfy plugin requirements
Plugin reinitialization required. Please run "terraform init".
Plugins are external binaries that Terraform uses to access and manipulate
resources. The configuration provided requires plugins which can't be located,
don't satisfy the version constraints, or are otherwise incompatible.
Terraform automatically discovers provider requirements from your
configuration, including providers used in child modules. To see the
requirements and constraints from each module, run "terraform providers".
Error: provider.aws: new or changed plugin executable
[terragrunt] [/Users/pietro/tmp/h] 2020/06/05 15:50:41 Module /Users/pietro/tmp/h has finished with an error: Hit multiple errors:
exit status 1
Error: Could not satisfy plugin requirements
Plugin reinitialization required. Please run "terraform init".
Plugins are external binaries that Terraform uses to access and manipulate
resources. The configuration provided requires plugins which can't be located,
don't satisfy the version constraints, or are otherwise incompatible.
Terraform automatically discovers provider requirements from your
configuration, including providers used in child modules. To see the
requirements and constraints from each module, run "terraform providers".
Error: provider.aws: new or changed plugin executable
[terragrunt] [/Users/pietro/tmp/i] 2020/06/05 15:50:41 Module /Users/pietro/tmp/i has finished with an error: Hit multiple errors:
exit status 1
Using --terragrunt-parallelism 1
fixes this but it makes my real code super slow to validate/plan/apply. My workaround is to emulate terragrunt init-all --terragrunt-parallelism 1
using bash to terragrunt init
each module sequentially.
This is because terraform isn't really designed to handle multiple concurrent calls to the binary at once. This leads to issues when all the terraform processes are trying to initialize the plugin cache and download the same versions of the provider (overwrite each other). With that said, this should work as expected once the plugin directory is sufficiently seeded.
Here are two other workarounds for this:
-
Continuously cycle between deleting the terragrunt cache (
find . -name ".terragrunt-cache" | xargs rm -r
) and runningterragrunt validate-all
until the plugin cache is seeded. -
Create a module for the sole purpose of seeding the plugin cache. This module should only have provider blocks with all the versions that you need to use. Then, you can run
terragrunt validate
just in that module to seed the cache.
Solving this is something we've been thinking about, but we don't have any design for a solution right now.
Terragrunt *-all commands run implicit terraform init
if no .terragrunt-cache
directory exists.
@yorinasub17 could additional parameter --terragrunt-init-parallelism
be implemented, so terragrunt would not run terraform init
in parallel avoiding this issue?
Very strange that I had pipeline with TF_PLUGIN_CACHE_DIR
and terragrunt plan-all
command on fresh-spawned VMs, and it worked well for a long time but stopped to work due to this issue a few weeks ago.
If there is a way to implement --terragrunt-init-parallelism
without overcomplicating the pipeline, then that could work. With that said, it could be confusing to have multiple parallelism
flags in that fashion.
Side note: I personally would rather invest in a proper dependency management solution. E.g., would be great if you could run terragrunt dep-retrieve
which would populate the plugin cache, and also some kind of module cache so reusable modules for the same versions are also shared. It would be more expensive to implement/design, but has high value.
As a quick and maybe quite clean workaround i followed another approach: i quickly wrote a bash wrapper script around terragrunt which basically does only the following:
- create a directory for caching of terraform plugins and export it as environment variable TF_PLUGIN_CACHE_DIR
- read a .tf file in which all needed plugins are specified
- run terraform init and clean up .terraform* files afterwards
- finally run terragrunt I think this could be implemented natively into terragrunt or am i wrong and this would solve the drawback of multiple parallel downloads of providers quite neatly by just creating and populating a caching directory beforehand.
Maybe as plan for implementation for the steps:
- To use this, a parameter --terragrunt-caching could be established which would "activate" all of this
- A parameter --terragrunt-cache-dir could let one specify the directory in which the cache will be stored. This cache dir could be purged before the run so one could always start with an empty cache. Also, this would shall be exported to the environment so terraform gets aware of all of this
- A parameter --terragrunt-cache-plugins could get a list of plugins to cache (for example as comma-separated string hashicorp/aws, hashicorp/template, ...). With the terragrunt native generate logic, one could generate a terraform file which only defines terraform { required_providers { ... stuff. Alternatively, the parameter --terragrunt-cache-plugins could be set directly to a terraform file.
- Then just run terraform init in a temporary directory (for example /tmp/terragrunt-init-dir) or maybe even in the directory specified in the --terragrunt-cache-dir directory. Afterwards clean up the additionally generated files .terraform*
- Continue as usual
Does this seem realistic? I think this shouldn't be too hard and is quite clean. If yes, the number of needed downloads could be reduced drastically when having dozens of modules if one knows beforehand which plugins are needed.
@Zyntogz your trick works because you have your terraform code locally.
But if terraform modules are used (source = github.com/...
) then *.tf files will not be populated until terragrunt init
executed
I worked around this by creating a cache-directory per plan. This does mean that each plan will have to download providers at least once, but on subsequent runs the cache can be used to fetch the providers. I configure my CI to cache the entire .terraform-plugin-cache
directory. I added the following to my top-level terragrunt.hcl
:
locals {
terraform_cache_dir = format("%s/%s", get_env("TF_PLUGIN_CACHE_DIR", "~/.terraform-plugin-cache"), path_relative_to_include())
}
terraform {
before_hook "provider_cache" {
commands = ["init", "validate", "plan", "apply"]
execute = ["mkdir", "-pv", local.terraform_cache_dir]
}
extra_arguments "provider_cache" {
commands = ["init", "validate", "plan", "apply"]
arguments = []
env_vars = {
TF_PLUGIN_CACHE_DIR = local.terraform_cache_dir
}
}
}
We have some projects with many terragrunt.hcl
files (e.g. infrastructure-live
repository), and terragrunt *-all
executions have started depleting the available disk space in our GitLab CI shared runners. As we don't maintain Terragrunt cache between jobs, a quick workaround for us has been to cleanup the generated Terragrunt cache as each module is processed:
# In the general terragrunt.hcl configuration file.
terraform {
after_hook "after_delete_terragrunt_cache" {
commands = ["validate", "plan", "apply"]
execute = ["rm", "-rf", ".terragrunt-cache"]
working_dir = "${get_terragrunt_dir()}"
run_on_error = true
}
}
Having a centralized TF_PLUGIN_CACHE_DIR
directory didn't work for us, when using Terragrunt parallelism, as many times concurrent module executions find partially downloaded providers, and fail.
@adamantike How are you managing your "output.tfplan"? After adding this section output is deleted every time and the execution is ending with:
Error: Failed to load "output.tfplan" as a plan file
│
│ Error: stat output.tfplan: no such file or directory
So we've also encountered this issue when using terragrunt run-all
commands together with plugin-dir
.
Here is how we've fixed it :
terraform {
source = "path/to/tf/module"
extra_arguments "terraform_args" {
commands = ["init"]
arguments = [
"-plugin-dir=/path/to/terraform/plugin-cache/"
]
}
}
Hope it'll help you.
@headincl0ud, you can either:
- Run the command specifying
-out ...
to a path that is not within the.terragrunt-cache
folder, so therm
command doesn't delete the generated plans, or - Replace
execute = ["rm", "-rf", ".terragrunt-cache"]
with a command that deletes.terragrunt-cache
content excluding*.tfplan
files (e.g. usingfind
).
@Xat59, take into account that the approach of centralizing the plugin directory is susceptible to the issue explained in this comment. With parallelism set, and a project with many terragrunt.hcl
files, chance for init
executions to fail by reading plugins partially downloaded by other parallel executions increase.
I am using terragrunt together with atlantis and terragrunt-config-generator and hit the same issue.
Plugins are already present in the TF_PLUGIN_CACHE_DIR
but still in most cases more than 50% of plans fail with errors like this:
Error: Required plugins are not installed
The installed provider plugins are not consistent with the packages
selected in the dependency lock file:
- registry.terraform.io/hashicorp/google-beta: the cached package for registry.terraform.io/hashicorp/google-beta 4.67.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
What confuses me, is the versions it looks up in the cache.
Here it is google-beta 4.67.0
, but in the .terraform.lock.hcl
it is fixed to 4.48.0
:
provider "registry.terraform.io/hashicorp/google-beta" {
version = "4.48.0"
constraints = "4.48.0"
Is this a separate issue I am encountering here?
I fixed something like this in #2542. Are you using Terragrunt >=v0.45.12?
I was on v0.44.5. Upgrading to v0.45.18 fixed the issue. Thank you very much!
@geekofalltrades I'm still seeing this with Terragrunt v0.48.0
and Terraform v1.5.2
.
Seeding the plugin cache does not seem to help either, I'm running into this issue after running a terragrunt run-all plan
. It seems this for some reason starts re-downloading the same provider that is already installed in the plugin cache.
│ Error: Required plugins are not installed
│
│ The installed provider plugins are not consistent with the packages
│ selected in the dependency lock file:
│ - registry.terraform.io/hashicorp/aws: the cached package for registry.terraform.io/hashicorp/aws 5.6.2 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file
│
│ Terraform uses external plugins to integrate with a variety of different
│ infrastructure services. To download the plugins required for this
│ configuration, run:
│ terraform init
I can not find any way to use the plugin cache without terragrunt breaking completely so I guess I'll just have to commit to keeping tens of GB with copies of the same provider library.
run-all plan
is parallelized, and the cache still doesn't support parallel write. You could try deleting the current cache for and running again with --terragrunt-parallelism 1
(or whatever the flag is). We solve this by having a separate no-op module that requires the union of all the providers we use and just running init
on it to warm the cache.
run-all plan
is parallelized, and the cache still doesn't support parallel write. You could try deleting the current cache for and running again with--terragrunt-parallelism 1
(or whatever the flag is). We solve this by having a separate no-op module that requires the union of all the providers we use and just runninginit
on it to warm the cache.
Hi @geekofalltrades,
Terragrunt itself does not install providers, Terraform is responsible for that, and as stated in their official documentation, they do not guarantee safe operation if init happens in parallel.
Note: The plugin cache directory is not guaranteed to be concurrency safe. The provider installer's behavior in environments with multiple terraform init calls is undefined.
Thus, we cannot influence it in any way.
Resolved in v0.50.11 release.
I am still seeing this issue with 0.50.11
Hi @Fomiller,
The reason may be that Terragrunt does not correctly detect that the cache is used https://github.com/gruntwork-io/terragrunt/blob/eec362e708edf2ba94c76e67caffe0b68110821d/terraform/config.go#L10-L18
Since Terragrunt detects correctly in my test environment, please provide an example to reproduce the issue.
With the following Terraform Version : 1.5.0 Terragrunt Version: 0.50.11 Terragrunt parallelism: 3 TF_PLUGIN_CACHE_DIR = /tmp/.terraform.d/plugin-cache/
When running terragrunt run-all apply --terragrunt-non-interactive
I receive the following error
Error: Failed to install provider
Error while installing hashicorp/aws v5.30.0: open
/tmp/.terraform.d/plugin-cache/registry.terraform.io/hashicorp/aws/5.30.0/linux_amd64/terraform-provider-aws_v5.30.0_x5:
text file busy
my providers declared in my root terragrunt.hcl file look like
terraform {
required_version = ">=1.3.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0.0"
}
template = {
source = "hashicorp/template"
version = "2.2.0"
}
random = {
source = "hashicorp/random"
version = "~> 2.3.0"
}
null = {
source = "hashicorp/null"
version = "3.2.1"
}
}
}
The overall file directory structure is very similar to the original issues.
Thank you @Fomiller! I will try to reproduce the issue locally and get back to you.
@Fomiller, I'm sorry to be late with the reply.
The only way at the moment is to run two commands:
-
run-all init
runsterraform init
sequentially for all modules, just like with--terragrunt-parallelism 1
- Any other command that can/should be executed in parallel.
We are working on the better solution #2920