nomad icon indicating copy to clipboard operation
nomad copied to clipboard

plan submit should not send serialized job version

Open tgross opened this issue 1 year ago • 0 comments

When a scheduler submits a plan to the leader, the Plan.Submit RPC includes the entire Job object from the state store of the server where the scheduler ran. When originally written there were no deployments or job version tracking, so the scheduler had to assume the job had been changed out from under it.

But this makes safely submitting plans during a cluster upgrade much more complicated. We canonicalize a job when we upsert it in the state store via Raft (ref fsm.go#L607-L614), and when we restore from snapshot (ref fsm.go#L1587), which happens when you upgrade a server. But a scheduler running on an older version of Nomad can submit a plan with a job that has not been canonicalized (so potentially including unexpected nil pointers), and the plan applier has to operator correctly on that job.

We could instead have the plan include the (namespace, job_id, version) tuple, and then the leader would read the job from its own state store. This would reduce the size of the Plan.Submit request and improve safety across cluster upgrades. The plan applier could reject plans for versions earlier than the latest, and that would prevent applying a plan that was generated from an earlier version of the job. We'd only have to worry about drift of updates to the same version, which are minimal (stopping, scaling, etc.) and for which we already have logic to repair by sending new evals.

The upgrade path for this change would itself be a little hairy: we'd need to ensure we're still sending the job with the plan if the leader isn't of a minimum Nomad version with this change. Another option to consider is gating the change behind a scheduler configuration flag.

tgross avatar Aug 07 '24 19:08 tgross