terragrunt
terragrunt copied to clipboard
[WIP] Terragrunt as a preprocessor
Description
This is a WIP PR from a hackday project that implements the idea in https://github.com/gruntwork-io/terragrunt/issues/759#issuecomment-585124357 to turn Terragrunt into a preprocessor for Terraform (similar to how Sass and Less are preprocessors for CSS).
It is NOT yet ready for review and merge.
Video overview
https://user-images.githubusercontent.com/711908/210280455-6ed5ae36-62f9-49df-b5ca-a5c46a81a131.mp4
Principles
Key idea: this is Terraform the way it should work. You get to write code in a way that works well from a developer perspective (simple, DRY) and after preprocessing that code, you get to deploy it in a way that works well from an operational perspective (secure, isolated, reviewable).
Input: pure, normal, native Terraform code
- You write code in normal
.tffiles - You create the code the "naive" way: one giant root module for all your infrastructure
- The root module uses sub-modules for each part of the infra: e.g., one sub-module for the VPC, one sub-module for the DB, one sub-module for each web service, etc.
- Because it's all normal TF code, you can use Terraform's native mechanisms to make everything DRY, manage the
backendconfig in one place, handle dependencies between sub-modules, and so on.
Here's an example of what the code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/before
By itself, this code is great from a developer perspective, but it's terrible from an operational perspective: see below for all the problems you'll run into.
Command: terragrunt process
There's really only one command to run: terragrunt process.
Output: pure, normal, native Terraform code
After you run terragrunt pocess, you get:
- Normal
.tffiles again - But now they are broken up across multiple environments: one top-level folder per environment
- And within each environment, they are broken up further by type of infra: a separate root module for each sub-module (e.g., VPC, EKS, web-service).
- The
backendis configured properly for each module - Dependencies across modules are automatically configured using
terraform_remote_state
Here's an example of what the generated code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/after
This generated code is optimized to work well from an operational perspective.
Deploy using TF
- Now you can go into each of the generated sub-folders and use Terraform as usual to deploy: e.g.,
terraform plan,terraform apply. - Nothing new to learn! You write pure Terraform code, just as you'd expect. After preprocessing, you interact with it using standard Terraform codes, just as you'd expect. No weird Terragrunt concepts to grapple with: no
terragrunt.hcl, no_envcommon, etc. - If you check the generated code into Git, it works natively with TFC and TFE too!
- No issues with debugging Terragrunt problems, as you can see exactly what the output is!
- No lock in: it's pure TF code, so if you don't like Terragrunt, you can stop using it any time.
The operational problems this fixes
Although "one giant root module with all your infra" is wonderful from a developer perspective, as it's easy to learn and keeps your code DRY, it has a bunch of drawbacks from an operational perspective:
- Security: with everything in one module, to deploy anything, you need access to everything (everyone has to be an admin to run
planorapply). - Speed: with a giant root module,
planandapplytake forever (for a large infra, tens of minutes!). - Code review: for a giant root module, the
planoutput is way too big to meaningfully read, so you blindlyapplychanges, rarely catching mistakes that slip through. - Automated testing: there's no meaningful way to do automated testing for a giant infra.
- Risk: all your eggs are in one basket. A single typo or mistake anywhere could break everything.
- Isolation: all your envs end up on the same version of every sub-module. There's no way to do immutable infrastructure practices and use different versions in different environments.
By using Terragrunt to "pre-process" your code, you get all the developer benefits when writing and maintaining the code, as it's simple and DRY, but now, because in the generated code, everything is broken up into separate environments and modules, all the operational problems above (security, speed, code review, automated testing, risk, and isolation) are mitigated!
TODOs before a full review & merging
- [ ] Add an example of how to support versioned modules (different versions in the
sourceURL in different environments) using override files. - [ ] Figure out how to migrate existing Terragrunt users to this new pattern.
- [ ] Figure out how to support providers not from HashiCorp (i.e., remove hard-coded
registry.terraform.io/hashicorpURL). - [ ] Add additional automated tests:
- Multiple files with output variables.
- Resource and data source handling.
- Remote backend (e.g., S3) handling.
This is incredible work. I love the approach. Much of my terraform code is a giant module in a single terraform state... It's so much easier to write. We've been in the process of migrating to terragrunt to get the benefits mentioned here, but it's been slow going. It's so nice to write terraform in this style, but as you said, it's a big headache operationally.
One question: How would this handle nested modules? One of the reasons why you might want to have nested modules is that there may be cross-communication between different regions. Say you're deploying to multiple regions and you want to have roughly the same configuration in each region, but then you need to set up some communication between those regions (eg. a multi-region consul deployment). You could imagine a "region" module that includes a bunch of sub-modules, where each of the sub-modules gets a terraform state. Does this pre-processor handle that or is it limited to one level of modules? If it's limited to one level of modules, how would one implement multiple regions per env with some cross relationship between those regions?
I guess you could have the tfvars split out into <env>_<region>, eg. dev_us-east-1,prod_us-west-1,etc? Or maybe you could imagine a folder hierarchy inside the tfvars folder, like dev/us-east-1/vpc, dev/us-west-2/vpc, prod/us-west-2/vpc, etc that maps down to replication of the modules or something. Just throwing ideas out there.
This is incredible work. I firmly believe that this is the way terraform should work (or at least allow us to work). In fact, I've been working on something similar, but from a totally external angle. I LOVE the way this is implemented and how it rolls into terragrunt.
Specifically, I'm glad this is through terragrunt because terragrunt provides a number of conveniences and augmentations that terraform does not. It's become an indispensable tool for my work. I don't image building anything even sufficiently complex without terragrunt.
I do, however, worry/wonder about the following:
Ever been working on some CSS in LESS or SASS, and everything looks good --> and then you process and generate the stylesheets for your bundle...and you're thinking, "why the heck does my generated CSS not work/look right?"
It's painful but workable to debug that and get to the bottom of the issue when you're able to run your code in a wannabe REPL (hot reloading vite/webpack server). In this case, I wonder what kind of pain or difficulty might emerge if/when the parsing/rendering produces an unexpected result, and you're trying to figure out what the root cause of that might be.
Would there be some way to either step through or debug the process command? If not, I feel like this would be a really valuable consideration. As it is, the development feedback loop for terraform is painful and slow (change code --> plan --> wait --> check --> repeat), especially with remote terragrunt state. Anything that might introduce further delay or complexity into that development loop would hurt, badly.
I hope that made sense...
Looks really great.
For next steps (possibly part of the paid for product)
terragrunt process -migrate which outputs a terragrunt folder structure (dependencies nested and using dependency) and an appropriate mix of aws S3 cp and terraform state rm or terraform import commands
As I understand, there is still something pending for this to work?