terraform
terraform copied to clipboard
"terraform providers mirror" skip downloading packages that are already present in the mirror directory
Terraform Version
v1.4.6
Use Cases
I would like to handle all the providers in a cached path on my or the remote machine which would allow me to:
- Save space on the disk as I would have all the providers cached in a specific path
- Not break multiple terraform commands execution at the same time when a provider is being downloaded
- Keep updated the cached providers once every a while without the need to worry about it when running the CI
Attempted Solutions
I was not able to achieve this. For now the only solution would be to enter the modules I have in my repository and run terraform init for each terraform block found with a different required_providers block,
Proposal
I think that there are two possibilities:
*Add a simple option named -download-providers-only which would simply download the providers if not already present in the providers directory without installing them in the .tf files directory
*Create new command plugin-download to simply do the above
References
No response
Hi @maonat! Thanks for sharing this use-case.
Have you seen the terraform providers mirror command? It seems like it's related to what you are asking about, though perhaps not exactly the same in the details.
Hi @apparentlymart, I've tried using the terraform providers mirror and is working ALMOST as I need: It is indeed doing just the download but it does not skip the download if the file already binary already exists in the directory.
Is this expected or should say that it's a possible feature request for this command?
Hi @maonat,
Having it skip re-downloading if it can detect that the mirror packages already match the checksums reported by the registry does seem like a very reasonable feature request. Shall we transform this issue into that? :grinning:
@apparentlymart that would be great!
How do I need to proceed? Close this issue and open a new one or proceed and add an edit: below the original discussion?
PS: Do you think it would be possible to skip the checksums check?
For now I've just changed the summary to reflect what we discussed and we'll let the discussion above document how we got here. I think that'll be sufficient to give us something to use for prioritization and gathering related use-cases.
This command's purpose is to synchronize a mirror with the content it is mirroring and so the checksum part seems important to allow the tool to repair a mirror that has become corrupted somehow, so that there isn't a broken or maliciously modified package present indefinitely.
However, if you can say more about why you'd want to disable checksum verification then of course we could consider that while designing this new behavior.
Thanks!
@apparentlymart I understand the issue on the checksums. Let's skip this for now 👍
Thanks @maonat! Since this is a valid feature request, we will leave it open to gather support and use cases. We appreciate your feedback!
I think I also have an use case as well.
On the company that I work for, we use Terraform (1.7.0) with Terragrunt (0.54.17) and we share the same short providers file (most related with AWS and Kubernetes) with all our terraform code.
Currently, we have 28 folders that we should interact at the same time with commands like terragrunt run-all plan and in every single folder we should init and download those providers, so to avoid spent extra time on terragrunt run-all init we created a centralized plugins cache and when it's required, we pull the plugins data from the cache.
This update will also improve the time spent downloading the same plugin twice and we also increase the number of folders that we interact at the same time.
Same problem here, I have a stack with multiple components and each component has pretty much the same providers but the mirror downloads them all every time.
Without a mirror, if there are multiple terraform init running in parallel they will error as they are trying to write to the same file.
The mirror has to run in serial and downloading the same provider over and over is extremely slow.
I, too, have encountered the problem of concurrent terraform init commands causing problems.
This can cause providers to be half-downloaded or something resulting in errors that look like
❯ tf validate
╷
│ Error: Failed to load plugin schemas
│
│ Error while loading schemas for plugin components: 2 problems:
│
│ - Failed to obtain provider schema: Could not load the schema for provider registry.terraform.io/hashicorp/azurerm: failed to instantiate provider "registry.terraform.io/hashicorp/azurerm" to obtain schema: fork/exec
│ .terraform/providers/registry.terraform.io/hashicorp/azurerm/4.23.0/windows_amd64/terraform-provider-azurerm_v4.23.0_x5.exe: %1 is not a valid Win32 application..
│ - Failed to obtain provider schema: Could not load the schema for provider registry.terraform.io/microsoft/azuredevops: failed to instantiate provider "registry.terraform.io/microsoft/azuredevops" to obtain schema: fork/exec
│ .terraform/providers/registry.terraform.io/microsoft/azuredevops/1.8.0/windows_amd64/terraform-provider-azuredevops_v1.8.0: %1 is not a valid Win32 application...
╵
Using the terraform providers mirror command unfortunately seems to redownload the providers already present, as this issue describes.
Using terraform init -plugin-dir=some/path does succeed at avoiding downloading, but succeeds too well and refuses to download any missing providers
❯ tf init -plugin-dir "$Env:TF_PLUGIN_CACHE_DIR"
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/azuread from the dependency lock file
- Reusing previous version of microsoft/azuredevops from the dependency lock file
- Reusing previous version of hashicorp/azurerm from the dependency lock file
╷
│ Error: Failed to query available provider packages
│
│ Could not retrieve the list of available versions for provider hashicorp/azuread: provider registry.terraform.io/hashicorp/azuread was not found in any of the
│ search locations
│
│ - C:/Users/redacted/.terraform.d/plugin-cache
╵
╷
│ Error: Failed to query available provider packages
│
│ Could not retrieve the list of available versions for provider microsoft/azuredevops: provider registry.terraform.io/microsoft/azuredevops was not found in any of
│ the search locations
│
│ - C:/Users/redacted/.terraform.d/plugin-cache
╵
╷
│ Error: Failed to query available provider packages
│
│ Could not retrieve the list of available versions for provider hashicorp/azurerm: provider registry.terraform.io/hashicorp/azurerm was not found in any of the
│ search locations
│
│ - C:/Users/redacted/.terraform.d/plugin-cache
╵
Note that having TF_PLUGIN_CACHE_DIR set is different than specifying -plugin-dir. Running terraform init with the TF_PLUGIN_CACHE_DIR env var set will still redownload the providers, and running terraform init -plugin-dir $Env:TF_PLUGIN_CACHE_DIR will refuse to download any missing providers and will fail.
Perhaps using the Explicit Installation Method Configuration it would be possible to tell terraform to try the local dir and fetch if missing, but I dislike a solution that requires me to edit files outside the scope of the immediate terraform workspace...
Therefore, I think the workaround I will pursue is something like:
- Have 50+ terraform dirs you want to apply
- Identify the set of unique required_providers
- Identify the set of existing cached providers
- Let to_download = required - existing
- Create a new terraform dir with required_providers=to_download
terraform providers mirror "$Env:TF_PLUGIN_CACHE_DIR"to download just the missing providers- You can now use
terraform init -plugin-dir "$Env:TF_PLUGIN_CACHE_DIR"in each terraform project since the cache has been fully populated
This is a lot of pain that would be avoided if the following were true:
terraform initto become concurrency safeterraform initto avoid downloading providers if already exist
Looks like having the lock file will cause the downloaded providers to be reused instead of downloaded again. In my scenario, I've been creating new terraform projects programmatically, so I could update my approach to also include the lock file instead of doing a mirror then init-with-plugin-dir approach
In my org, we have to be able to run terraform inside a closed network with no outside access. We have hundreds of different modules, each with unique terraform needs, so we end up caching 30+ different provider-versions.
I recently had to optimize the script we use to generate this provider cache because the issue described in the OP results in a ton of wasted time and bandwidth.
I wanted to share the solution I came up with, because it might help others in the same boat, and it might help inform a better implementation.
- find all directories that contain a
main.tf - in each directory:
- run
terraform init -backend=falseso providers would be downloaded and collect transitive dependencies in the lock file - copy the resulting
.terraform.lock.hclto a temp directory, renaming it to avoid collision
- iterate through each captured lock file and parse out the provider-version pairs used, reducing to a unique set
- create dummy terraform projects to refer to each provider-version and run
terraform provider mirroron them, so each provider-version is downloaded only once
I feel like it should be possible to optimize things a little (e.g. resolve dependencies w/o doing a full init, avoid re-downloading files that already exist on disk) to make this process unnecessary.