tofu-controller icon indicating copy to clipboard operation
tofu-controller copied to clipboard

Make it possible to run branch planner in HA configuration (replicas > 1)

Open yitsushi opened this issue 2 years ago • 4 comments

Right now, by design, branch planner can't be replicated. To achieve this, we need a polling mechanism that can share work between instances.

=========

User Story:

As a Branch Planner developer, I'd like to make the branch planner scalable, so it can run in multiple instances and it can be configured to be HA with replica count.

Acceptance Criteria:

  • [ ] Ensure branch planner can be replicated without affecting existing functionality and without instancing fighting over branch planner resources.

yitsushi avatar Jul 18 '23 14:07 yitsushi

I think just being to run more goroutines would be good enough for now (I'm not clear whether this is the suggestion). Making it scale by sharing work amongst several pods is much more involved. Running more worker goroutines would be simple if branch-planner is ported to run as a controller-runtime controller.

squaremo avatar Nov 02 '23 11:11 squaremo

For HA systems, they may want to run the branch planner controller with at least 2 instances, preferably in different nodes. It's not about scaling to manage more resources, but scaling as availability. Right now if they set the controller to replica > 1 each controller will fight for each repo and branch and one will create resources the rest will error on that one and they kind of simultaneously go to the next PR to check changes.

yitsushi avatar Nov 05 '23 16:11 yitsushi

if they set the controller to replica > 1 each controller will fight for each repo and branch

It's a good argument for porting to a controller-runtime Manager, so it can use leader election conveniently. If a pod crashes and another takes over, is there any state lost that would stop the second pod working properly?

squaremo avatar Nov 06 '23 10:11 squaremo

I think you'll need to revisit the acceptance criteria, if the point was HA rather than horizontal scalability.

squaremo avatar Nov 06 '23 10:11 squaremo