terraspace Parallel Terraspace Execution

Summary

Support running terraspace [all] init/plan/up in parallel within the same workspace.

Motivation

I'd like to be able to, at the very least, execute parallel terraspace all plan on a given CI build. Imagine your terraspace project has multiple layers, separated per-region. When raising a pull request to that project, ideally, your CI process should execute a plan on all the regions associated to that project to see what effect your change has on each of those layers. Having the CI process execute a plan on all the regions would provide the most feedback to the engineer to validate that his/her change has the desired effect on the given infrastructure.

I don't think parallel execution is currently possible as terraspace writes its log files to a flat folder structure rather than a layered one. Given a terraspace project which is using layering, terraspace build will create a per-layer directory structure with the resulting terraform root module such as .terraspace-cache/<region>/<env>/[modules,stacks]. However, when your run an [all] plan or [all] up, the logs will be stored in a flattened structure such as logs/plan/plan.log

Guide-level explanation

I don't think there is anything to add here.

Reference-level explanation

Identify the various layers which need to have a plan run against them
Trigger a plan per-layer
Each plan should write to its own /log/<layer>/<env>/plan/plan.log
Any other non-layered disk access would also need to follow the same pattern as above, or in .terraspace-cache/<layer>

Drawbacks

Possible complexity issues?

Unresolved Questions

Not sure.

Apr 06 '21 14:04 blucas

👍 from me

Apr 06 '21 14:04 dgonzalez

Dug into this a while back. Though it may seem pretty simple, it's quite complex. When tried creating parallel processes to deploy environments and or multiple regions at the same, here were some of the complexities that ran into.

When terraspace builds the terraform project, things like TS_ENV and AWS_REGION are set very early in the boot process. The boot process then loads terraspace plugins for clouds like AWS, Azure and Google. The plugins memorize values like region, account, etc. Tried editing the plugins to allow these memoized values to be changed instead. Even though it was hacky, ran with it.

Another complication is the terraspace all dependency graph. Currently, the graph only has to be aware of one TS_ENV. With multiple environments happening at the same time, the TS_ENV can switch and interfere with each other. Tried running these in separate additional processes and switched within the process. This requires extra coordination and considerations. For example, the way the build cache is cleared needs to be reworked. It got pretty messy and concluded that it's not worth the complexity.

Sometimes, folks tend to try to fit everything into one tool to come up with a “god” command. It may be impossible to ever fit the god criteria. As the linux saying goes, “Use the right tool for the right job”. Did an interview with Anton B, he explains it pretty clearly: “We still have makefiles, we still have shell”. Here’s the video at the specific time: https://youtu.be/J_-XPfFlsbU?t=6420

Some more thoughts here:

https://community.boltops.com/t/handling-multiple-providers-accounts-roles-regions/625/6
https://community.boltops.com/t/customized-layering-support/632/13
https://community.boltops.com/t/pass-gcp-credentials-to-tfc/631/7

So that's the current thought on this. 🧐 It add too much complexity. Suggestion is to call terraspace multiple times for different envs or regions, and use a wrapper script or tools like make if you want them to happen together. This also decouples them. Noting this for posterity, but am open to other attempts if can figure out a way to keep complexity down.

Apr 15 '21 17:04 tongueroo

Hi @tongueroo thanks for the detailed explanation. It does make sense and would add waaay too much complexity to the tool. However, I think my ask is different from your interpretation.

Suggestion is to call terraspace multiple times for different envs or regions, and use a wrapper script or tools like make if you want them to happen together.

This is exactly what I am proposing. If I opened three shells to the same Terraspace Project workspace, and executed a terraspace all plan in parallel on each shell but with different TS_ENV values, I would encounter issues with the filesystem, as these shells would compete when writing their output to the logs/plan/ directory. I'm assuming there would be other filesystem related issues besides the logs directory, but thats the first I can think of.

The simplistic solution would be to clone the project into three working directories, then in parallel, execute terraspace all plan in each working directory with a different TS_ENV value. I was just hoping that I could avoid doing that.

I hope that better explains my ask.

Apr 16 '21 19:04 blucas

@blucas I see. Misunderstood. Thanks for explaining again. Some thoughts:

Would need to identify all the places where multiple processes are writing to the same location and causing issues. You're probably right. There are probably other filesystem related issues. Might just have to brute force test it to find them. 💪

Then would need to provide a way to customize paths used. Maybe something like the build.cache_dir setting that is customizable. https://terraspace.cloud/docs/config/reference/ Unsure when will take a look at this. Will consider PRs for it. No sweat either way of course.

May 03 '21 23:05 tongueroo

terraspace terraspace copied to clipboard

Parallel Terraspace Execution

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Unresolved Questions

terraspace
terraspace copied to clipboard