tofu-controller icon indicating copy to clipboard operation
tofu-controller copied to clipboard

Dependencies are on original objects, not the branch objects, causing lock clashes

Open darrylweaver opened this issue 2 years ago • 2 comments

User Story As a Terraform user, I want to be able to complete my operation whenever I make a change in a branch whose objects have dependencies, so that I can have focused visibility into the changes I’m applying in that branch and not be blocked.

Implementation/Steps

  • [ ] Delete the dependency block ("easy fix")
  • [ ] Do the planning independently. Not sure about side effect of doing this and how it would affect current chain of provisioning, however, so will observe what happens.
  • [ ] Ask @darrylweaver to see if it resolves the rest of his problem. (The duplication will not go away right now.)
  • [ ] If it doesn’t go away, then we’ll iterate.

Acceptance Criteria

  • [ ] The user no longer experiences locking of their operation (i.e. when two objects do similar planning on the same set of resources and the operation stops like a race condition).
  • [ ] The plan works for the objects with dependencies. The original resources get paused until the planning process is done, then the user can merge and resume the object.
  • [ ] User no longer sees duplicate output in situations where multiple objects per cluster generate multiple TF outputs.

When using several terraform CRs, where:

  1. Target is used to target a particular resource for that terraform run
  2. Dependencies are declared between terraform CRs
  3. A shared state is used to deploy each target once only

The branch planner would create all the terraform CRs with the suffix -pr-XX But the dependencies in the generated terraform CRs still point to the original resources.

This results in multiple terraform CRs running simultaneously and lock clashes. Eventually all the terraform CRs will run to completion due to eventually gaining the lock file.

The terraform CRs should have the dependencies updated so that they run in the right order for the PR resources instead.

It should be noted that this would still not result in a correct plan in all cases, depending on the targets used and when no targets are used, but typically these would be duplicates of changes rather than missing any changes.

An example of this would be: A single Terraform HCL directory defining a VPC, EKS cluster and K8s config. 3 Terraform controller CRs that have targets set of: module.vpc module.eks and finally no targets set at all. And dependencies so that the module.vpc runs first, then module.eks, then all the rest of the config for k8s.

For the branch planner: When the module.vpc runs it identifies any changes to the VPC. When the module.eks runs it identifies any changes to the EKS cluster. When the no target runs it identifies all changes to the VPC and EKS cluster and K8s. this results in PR comments that show all the changes, but VPC changes more than once.

darrylweaver avatar Oct 02 '23 11:10 darrylweaver

@chanwit Some things for us to clarify based on my chat with @madAndroid:

  • Re: deleting the dependency block: Is the dependency removal temporary? And do you reinstate them when the run is done?
  • In some contexts you will need the dependencies available, otherwise you will make the situation worse because the dependency is there for a reason. The dependencies are on the original objects. If you have shared state between resources you will see it.
  • How do we know this won’t cause bigger problems, what’s the impact?
  • Does this mirror TF behavior?

lasomethingsomething avatar Oct 05 '23 15:10 lasomethingsomething

The technical stuff behind this scenario is quite complex. So I'm explaining here again:

Initially, we have a chain of Terraform objects with dependencies:

  (A)
   |
  (B)
   |
  (C)

Then we'd like to create a PR for object (B) and the Branch Planner would do the followings:

  1. Suspend (B), so the state of (B) would change to suspended B. We'll denote with (B-s).
  (A)
   |
 (B-s)
   |
  (C)
  1. The branch planner then clone (B-s) to (B') and we would get this setup:
   (A)
    |  \
 (B-s)--(B')
    |  /
   (C)
  1. We then remove (B') from the dependency chain because it would be planned as an independent object (of cause it's still be able to read secrets from the original chain).
   (A)
    |  
 (B-s)--(B')
    |  
   (C)
  1. After we happy with the PR, we merge (B') to (B-s). After resume (B-s) would become (B) and GitOps continues.
  (A)
   |
  (B)
   |
  (C)

chanwit avatar Oct 06 '23 06:10 chanwit