terraform icon indicating copy to clipboard operation
terraform copied to clipboard

Module cache

Open tmccombs opened this issue 5 months ago • 3 comments

Terraform Version

Terraform v1.7.2
on linux_amd64

Use Cases

Currently whenever you use a remote module, terraform has to download it and put it in .terraform/modules.

In cases where you have the same module used in many different workspaces, this results in wasting a lot of time and disk space downloading the same module over and over again.

Attempted Solutions

I don't really know of any solutions besides just re-downloading the file every time.

Or maybe doing some kind of pre-download and changing the source of the module to point to the local file. But that doesn't work very well if you are using terraform cloud or enterprise to run terraform.

Proposal

Have a cache for modules similar to the provider cache.

References

No response

tmccombs avatar Feb 07 '24 22:02 tmccombs

Thanks for this feature request! If you are viewing this issue and would like to indicate your interest, please use the 👍 reaction on the issue description to upvote this issue. We also welcome additional use case descriptions. Thanks again!

crw avatar Feb 07 '24 23:02 crw

Thanks for suggesting this, @tmccombs!

I think we already have an issue somewhere that overlaps with this request, but I wasn't able to find it in quick searching.

The main thing I think of when I imagine doing this is the classic problem that today there are lots of modules in the world that modify their own source directory while doing their work, such as by using provisioner "local-exec" and redirecting output to disk, or using the badly-behaved archive_file data source from hashicorp/archive that does side-effects during the planning phase.

As long as that remains true I don't think we could implement something exactly like the provider plugin cache, because self-modifying modules would corrupt the cache when they run. The closest we could get is to deep-copy the module trees from the cache during terraform init and thus at least avoid retrieving them over the network, similar to what terraform init already does when it can detect that more than one module block in the configuration refers to the same module package.


We are planning to use the relatively clean slate that Terraform Stacks implies to impose a small number of new constraints that enable implementing some long-wanted features, and one of them is that modules are required to treat their source directories as read-only. Modules that self-modify won't work under Stacks until they are updated to use a different strategy.

As long as Terraform Stacks is only accessible through Terraform Cloud for private preview this doesn't really help anything because Terraform Cloud needs to retrieve the source code over the network each round anyway, but maybe this is something to keep in mind for the eventual CLI-driven Stacks workflow, so that whatever ends up being the equivalent to terraform init for a stack configuration can support optionally installing through a shared cache directory.

If we do that, I expect it would really mean adding cache directory support to package sourcebundle, since that's the mechanism responsible for building the dependency bundle for a stack configuration.

apparentlymart avatar Feb 08 '24 01:02 apparentlymart

@apparentlymart The issue you had in mind might be #16268.

The closest we could get is to deep-copy the module trees from the cache during terraform init and thus at least avoid retrieving them over the network, similar to what terraform init already does when it can detect that more than one module block in the configuration refers to the same module package.

That would be a major improvement for larger codebases. The local copying is suboptimal, but nothing compared to the cost of redownloading the packages from the network every time, especially on CI. It would also enable the use of GitHub Actions caching for modules.

I started to look into this and I think this could be done just at the level of the FetchPackage (https://github.com/hashicorp/terraform/blob/main/internal/getmodules/installer.go#L45), i.e. not touching any of the module resolution logic, only cache go-getter results (and only for remote packages). This wouldn't be 100% offline yet, but caching the resolution in a lock file for modules could be done as the next step.

Does that sound like a viable approach?

lqc avatar Feb 17 '24 11:02 lqc