Orchestrator rolling updates with job definiton
Description
- This will allow us to update orchestrator job definition without causing downtime
- Also it improves observability how many orchestrators are running for the current version
How it works
We generate new unique id if either job definition, secret or orchestrator binary changed
We use this ID to generate a new job, which has the new job definition
We save the ID to nomad as a variable and theres's a prestart check, which compares the ID of the job with the latest ID (the one saved in nomad), if they don't match the orchestrator is not started
This means there will be multiple jobs for orchestrator, but new orchestrator will start only for the latest job
I think the migration could be the following:
- Adjust the priority of the new job so it is evaluated before the old orchestrator job—it will block the old orchestrator job from being deployed for the new nodes
- Deploy the new job once and roll all orchestrators
- Remove the old orchestrator job
The only question left is how to delete the old jobs that are unused.
@jakubno The priority on the new job cannot make it so that the new job is evaluated before the currently running orchestrator? Thinking if we even need the wait at all.
The only question left is how to delete the old jobs that are unused.
Also we need to solve this before merging.
The only question left is how to delete the old jobs that are unused.
Also we need to solve this before merging
There won't be that many of them, deploying new orchestrator version is now rather slow process
Wait for #647 is deployed