terragrunt icon indicating copy to clipboard operation
terragrunt copied to clipboard

[WIP] Terragrunt as a preprocessor

Open brikis98 opened this issue 2 years ago • 4 comments

Description

This is a WIP PR from a hackday project that implements the idea in https://github.com/gruntwork-io/terragrunt/issues/759#issuecomment-585124357 to turn Terragrunt into a preprocessor for Terraform (similar to how Sass and Less are preprocessors for CSS).

It is NOT yet ready for review and merge.

Video overview

https://user-images.githubusercontent.com/711908/210280455-6ed5ae36-62f9-49df-b5ca-a5c46a81a131.mp4

Principles

Key idea: this is Terraform the way it should work. You get to write code in a way that works well from a developer perspective (simple, DRY) and after preprocessing that code, you get to deploy it in a way that works well from an operational perspective (secure, isolated, reviewable).

Input: pure, normal, native Terraform code

  • You write code in normal .tf files
  • You create the code the "naive" way: one giant root module for all your infrastructure
  • The root module uses sub-modules for each part of the infra: e.g., one sub-module for the VPC, one sub-module for the DB, one sub-module for each web service, etc.
  • Because it's all normal TF code, you can use Terraform's native mechanisms to make everything DRY, manage the backend config in one place, handle dependencies between sub-modules, and so on.

Here's an example of what the code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/before

By itself, this code is great from a developer perspective, but it's terrible from an operational perspective: see below for all the problems you'll run into.

Command: terragrunt process

There's really only one command to run: terragrunt process.

Output: pure, normal, native Terraform code

After you run terragrunt pocess, you get:

  • Normal .tf files again
  • But now they are broken up across multiple environments: one top-level folder per environment
  • And within each environment, they are broken up further by type of infra: a separate root module for each sub-module (e.g., VPC, EKS, web-service).
  • The backend is configured properly for each module
  • Dependencies across modules are automatically configured using terraform_remote_state

Here's an example of what the generated code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/after

This generated code is optimized to work well from an operational perspective.

Deploy using TF

  • Now you can go into each of the generated sub-folders and use Terraform as usual to deploy: e.g., terraform plan, terraform apply.
  • Nothing new to learn! You write pure Terraform code, just as you'd expect. After preprocessing, you interact with it using standard Terraform codes, just as you'd expect. No weird Terragrunt concepts to grapple with: no terragrunt.hcl, no _envcommon, etc.
  • If you check the generated code into Git, it works natively with TFC and TFE too!
  • No issues with debugging Terragrunt problems, as you can see exactly what the output is!
  • No lock in: it's pure TF code, so if you don't like Terragrunt, you can stop using it any time.

The operational problems this fixes

Although "one giant root module with all your infra" is wonderful from a developer perspective, as it's easy to learn and keeps your code DRY, it has a bunch of drawbacks from an operational perspective:

  • Security: with everything in one module, to deploy anything, you need access to everything (everyone has to be an admin to run plan or apply).
  • Speed: with a giant root module, plan and apply take forever (for a large infra, tens of minutes!).
  • Code review: for a giant root module, the plan output is way too big to meaningfully read, so you blindly apply changes, rarely catching mistakes that slip through.
  • Automated testing: there's no meaningful way to do automated testing for a giant infra.
  • Risk: all your eggs are in one basket. A single typo or mistake anywhere could break everything.
  • Isolation: all your envs end up on the same version of every sub-module. There's no way to do immutable infrastructure practices and use different versions in different environments.

By using Terragrunt to "pre-process" your code, you get all the developer benefits when writing and maintaining the code, as it's simple and DRY, but now, because in the generated code, everything is broken up into separate environments and modules, all the operational problems above (security, speed, code review, automated testing, risk, and isolation) are mitigated!

TODOs before a full review & merging

  • [ ] Add an example of how to support versioned modules (different versions in the source URL in different environments) using override files.
  • [ ] Figure out how to migrate existing Terragrunt users to this new pattern.
  • [ ] Figure out how to support providers not from HashiCorp (i.e., remove hard-coded registry.terraform.io/hashicorp URL).
  • [ ] Add additional automated tests:
    • Multiple files with output variables.
    • Resource and data source handling.
    • Remote backend (e.g., S3) handling.

brikis98 avatar Jan 02 '23 20:01 brikis98

This is incredible work. I love the approach. Much of my terraform code is a giant module in a single terraform state... It's so much easier to write. We've been in the process of migrating to terragrunt to get the benefits mentioned here, but it's been slow going. It's so nice to write terraform in this style, but as you said, it's a big headache operationally.

One question: How would this handle nested modules? One of the reasons why you might want to have nested modules is that there may be cross-communication between different regions. Say you're deploying to multiple regions and you want to have roughly the same configuration in each region, but then you need to set up some communication between those regions (eg. a multi-region consul deployment). You could imagine a "region" module that includes a bunch of sub-modules, where each of the sub-modules gets a terraform state. Does this pre-processor handle that or is it limited to one level of modules? If it's limited to one level of modules, how would one implement multiple regions per env with some cross relationship between those regions?

I guess you could have the tfvars split out into <env>_<region>, eg. dev_us-east-1,prod_us-west-1,etc? Or maybe you could imagine a folder hierarchy inside the tfvars folder, like dev/us-east-1/vpc, dev/us-west-2/vpc, prod/us-west-2/vpc, etc that maps down to replication of the modules or something. Just throwing ideas out there.

wraithm avatar Jan 05 '23 08:01 wraithm

This is incredible work. I firmly believe that this is the way terraform should work (or at least allow us to work). In fact, I've been working on something similar, but from a totally external angle. I LOVE the way this is implemented and how it rolls into terragrunt.

Specifically, I'm glad this is through terragrunt because terragrunt provides a number of conveniences and augmentations that terraform does not. It's become an indispensable tool for my work. I don't image building anything even sufficiently complex without terragrunt.

I do, however, worry/wonder about the following:

Ever been working on some CSS in LESS or SASS, and everything looks good --> and then you process and generate the stylesheets for your bundle...and you're thinking, "why the heck does my generated CSS not work/look right?"

It's painful but workable to debug that and get to the bottom of the issue when you're able to run your code in a wannabe REPL (hot reloading vite/webpack server). In this case, I wonder what kind of pain or difficulty might emerge if/when the parsing/rendering produces an unexpected result, and you're trying to figure out what the root cause of that might be.

Would there be some way to either step through or debug the process command? If not, I feel like this would be a really valuable consideration. As it is, the development feedback loop for terraform is painful and slow (change code --> plan --> wait --> check --> repeat), especially with remote terragrunt state. Anything that might introduce further delay or complexity into that development loop would hurt, badly.

I hope that made sense...

armenr avatar Jan 15 '23 09:01 armenr

Looks really great. For next steps (possibly part of the paid for product) terragrunt process -migrate which outputs a terragrunt folder structure (dependencies nested and using dependency) and an appropriate mix of aws S3 cp and terraform state rm or terraform import commands

timothyclarke avatar Aug 17 '23 13:08 timothyclarke

As I understand, there is still something pending for this to work?

tuxillo avatar Feb 07 '24 16:02 tuxillo