atlantis icon indicating copy to clipboard operation
atlantis copied to clipboard

fix: parallel plan and apply also in a single workspace

Open finnag opened this issue 1 year ago • 6 comments

what

  • Add more thorough locking around Clone() calls, covering all of these phases:
    • Am I on the right commit
    • Merge with upstream
    • Clone if necessary
  • Reduce the number of remote git operations when planning or applying in parallel
  • Clean up the Clone() method, split into Clone() and MergeAgain()

For parallel mode to work, you must either set the environment variable TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE to something, or check in your .hcl files. Otherwise terraform cannot run in parallel.

why

The Clone call had several race conditions where it could miss clones or delete the working directory under running processes causing failures.

tests

  • [X] I have tested my changes by make test-all
  • [X] Run in production with several repos, large and small, including a monorepo multi-directory setup

references

finnag avatar Aug 14 '23 14:08 finnag

This has a lot of core changes but they do make sense. I'm a workspace user and I need to make sure this works with multiple workspaces well, BUT right now we are at an inflection point with the work on Locks, and I think I will have to defer this to @GenPage because this interjects with his work on the https://github.com/runatlantis/atlantis/pull/3345 and can potentially conflict with it, hence, we need to be careful coordinate on this.

@finnag I want to set the right expectations that this might take a while to be reviewed/merged so please stick with us while we go through this process.

@finnag Thanks a lot for this contribution

jamengual avatar Aug 16 '23 16:08 jamengual

@GenPage I believe this is sanitized now, let me know if you want me to split it up or have some other reservations

finnag avatar Sep 29 '23 11:09 finnag

Thanks @finnag, let me take some more time to review this.

GenPage avatar Oct 06 '23 15:10 GenPage

@finnag I agree with @jamengual that we will hold on to this until we can properly fix the existing lock regressions plaguing Atlantis.

GenPage avatar Dec 11 '23 18:12 GenPage

Some time has passed, we've run with this in production for more than a year now, happily planning and applying in parallel. Rebased to v0.30.0.

finnag avatar Oct 08 '24 16:10 finnag

@finnag

If we wanted to run this in our K8s cluster, are there any chances you would be aware of requiring? We already set TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE to True

jseiser avatar Oct 21 '24 13:10 jseiser