terraspace
terraspace copied to clipboard
Parallel Terraspace Execution
Summary
Support running terraspace [all] init/plan/up
in parallel within the same workspace.
Motivation
I'd like to be able to, at the very least, execute parallel terraspace all plan
on a given CI build. Imagine your terraspace project has multiple layers, separated per-region. When raising a pull request to that project, ideally, your CI process should execute a plan on all the regions associated to that project to see what effect your change has on each of those layers. Having the CI process execute a plan on all the regions would provide the most feedback to the engineer to validate that his/her change has the desired effect on the given infrastructure.
I don't think parallel execution is currently possible as terraspace writes its log files to a flat folder structure rather than a layered one. Given a terraspace project which is using layering, terraspace build
will create a per-layer directory structure with the resulting terraform root module such as .terraspace-cache/<region>/<env>/[modules,stacks]
. However, when your run an [all] plan
or [all] up
, the logs will be stored in a flattened structure such as logs/plan/plan.log
Guide-level explanation
I don't think there is anything to add here.
Reference-level explanation
- Identify the various layers which need to have a
plan
run against them - Trigger a
plan
per-layer - Each plan should write to its own
/log/<layer>/<env>/plan/plan.log
- Any other non-layered disk access would also need to follow the same pattern as above, or in
.terraspace-cache/<layer>
Drawbacks
- Possible complexity issues?
Unresolved Questions
Not sure.
👍 from me
Dug into this a while back. Though it may seem pretty simple, it's quite complex. When tried creating parallel processes to deploy environments and or multiple regions at the same, here were some of the complexities that ran into.
When terraspace builds the terraform project, things like TS_ENV
and AWS_REGION
are set very early in the boot process. The boot process then loads terraspace plugins for clouds like AWS, Azure and Google. The plugins memorize values like region, account, etc. Tried editing the plugins to allow these memoized values to be changed instead. Even though it was hacky, ran with it.
Another complication is the terraspace all
dependency graph. Currently, the graph only has to be aware of one TS_ENV
. With multiple environments happening at the same time, the TS_ENV can switch and interfere with each other. Tried running these in separate additional processes and switched within the process. This requires extra coordination and considerations. For example, the way the build cache is cleared needs to be reworked. It got pretty messy and concluded that it's not worth the complexity.
Sometimes, folks tend to try to fit everything into one tool to come up with a “god” command. It may be impossible to ever fit the god criteria. As the linux saying goes, “Use the right tool for the right job”. Did an interview with Anton B, he explains it pretty clearly: “We still have makefiles, we still have shell”. Here’s the video at the specific time: https://youtu.be/J_-XPfFlsbU?t=6420
Some more thoughts here:
- https://community.boltops.com/t/handling-multiple-providers-accounts-roles-regions/625/6
- https://community.boltops.com/t/customized-layering-support/632/13
- https://community.boltops.com/t/pass-gcp-credentials-to-tfc/631/7
So that's the current thought on this. 🧐 It add too much complexity. Suggestion is to call terraspace multiple times for different envs or regions, and use a wrapper script or tools like make if you want them to happen together. This also decouples them. Noting this for posterity, but am open to other attempts if can figure out a way to keep complexity down.
Hi @tongueroo thanks for the detailed explanation. It does make sense and would add waaay too much complexity to the tool. However, I think my ask is different from your interpretation.
Suggestion is to call terraspace multiple times for different envs or regions, and use a wrapper script or tools like make if you want them to happen together.
This is exactly what I am proposing. If I opened three shells to the same Terraspace Project workspace, and executed a terraspace all plan
in parallel on each shell but with different TS_ENV
values, I would encounter issues with the filesystem, as these shells would compete when writing their output to the logs/plan/
directory. I'm assuming there would be other filesystem related issues besides the logs
directory, but thats the first I can think of.
The simplistic solution would be to clone the project into three working directories, then in parallel, execute terraspace all plan
in each working directory with a different TS_ENV
value. I was just hoping that I could avoid doing that.
I hope that better explains my ask.
@blucas I see. Misunderstood. Thanks for explaining again. Some thoughts:
Would need to identify all the places where multiple processes are writing to the same location and causing issues. You're probably right. There are probably other filesystem related issues. Might just have to brute force test it to find them. 💪
Then would need to provide a way to customize paths used. Maybe something like the build.cache_dir
setting that is customizable. https://terraspace.cloud/docs/config/reference/ Unsure when will take a look at this. Will consider PRs for it. No sweat either way of course.