convoy icon indicating copy to clipboard operation
convoy copied to clipboard

Consider Making Scheduler Active-Passive HA Capable

Open shin-go opened this issue 1 year ago • 1 comments

The documentation for Convoy's architecture today notes the Scheduler should be run singularly in a deployment of Convoy itself: To avoid, duplicate jobs, there should only exist one scheduler across your deployment.

To prevent the Scheduler from being a single point of failure for Convoy, some consideration should be given to allow multiple instances of it to run in an active-passive HA configuration. This is particularly important in a Kubernetes context where worker nodes that pods running Convoy (server, worker, etc.) schedule on may come and go due to natural expansion and contraction of the Kubernetes cluster itself can and will result in eviction and rescheduling of the Scheduler elsewhere. Rather than risk a period of time where there is no Scheduler component running - Kubernetes pod or otherwise - a leadership election mechanism should be considered for resilience.

Regardless of how Convoy is deployed, the Scheduler shouldn't be a single point of failure for the system.

shin-go avatar Apr 24 '23 19:04 shin-go

Hey @shin-go we're considering re-architechting how the scheduler works so it can be scaled horizontally

jirevwe avatar Apr 24 '23 19:04 jirevwe