vitess-operator Document rolling update process

We need to put up a doc on how rolling updates work with the operator. In the meantime, here is at least a dump of some discussion from Slack:

in general, all edits should be made only to the top-level vt. the others (vtc/vtk/vts) should be treated as read-only since they are actively managed by the top-level vt. this is similar to the pattern used in the core k8s Deployment API - one should not directly edit the underlying ReplicaSets it creates.

to create a new keyspace, you can just edit the vt spec to add it. it should get deployed immediately

however, edits to an already-deployed keyspace will be intentionally held back until the next rollout.

The rollout can be managed with annotations as follows:

whenever there are changes held back, the pending changes will be listed in an annotation on the object called rollout.planetscale.com/scheduled

so for example, when you edit an already-deployed keyspace in the vt, that annotation will appear on the corresponding vtk

the way to say "ok; go ahead and release those scheduled changes on this object" is to add an annotation called rollout.planetscale.com/released with any string value (even empty string). once this annotation is present, the scheduled changes will be applied

this needs to be done recursively: after releasing changes to the vtk, the vtk will schedule changes on its vts children which then need to be released; then each vts will schedule changes on the tablet Pods that will need to be released. the idea is that the operator handles actually doing the updates, but the process of deciding which things to update in which order and with what concurrency is left to an orthogonal system

btw, when the rollout.planetscale.com/released annotation is applied to a vttablet Pod with pending changes, the operator will first ensure that tablet is not a master for its shard, doing a PlannedReparentShard automatically if necessary, before recreating the Pod to update its spec

unless there are no other master-eligible replicas provisioned, in which case there's nothing that can be done except restarting the master. in that case, you must manually delete the master Pod once the rollout.planetscale.com/scheduled annotation appears on the Pod. the operator will not delete a master Pod.

Feb 26 '20 19:02 enisoc

this needs to be done recursively: after releasing changes to the vtk, the vtk will schedule changes on its vts children which then need to be released; then each vts will schedule changes on the tablet Pods that will need to be released. the idea is that the operator handles actually doing the updates, but the process of deciding which things to update in which order and with what concurrency is left to an orthogonal system

Can you mention how deployments and replicasets (and anything else) falls into this?

Do changes ever get queued in rollout.planetscale.com/scheduled for Services and other objects that are separate from deployments/pods? Or are such changes always applied immediately?

Can you document the usage of rollout.planetscale.com/cascade as well? As well as the cases where one needs to manually remove rollout.planetscale.com/cascade or rollout.planetscale.com/released?

Can you show examples of adding/removing these annotations?

Is VitessClusterUpdateStrategyType: Immediate equivalent to immediately applying rollout.planetscale.com/released / rollout.planetscale.com/cascade recursively on everything? Or is there any other difference?

Jan 15 '21 03:01 jmoldow

Are there any kinds of alters that don't go through the rolling update process? (e.g. increasing the number of replicas?)

Jan 15 '21 03:01 jmoldow

We really need some official docs on update rollout.

Jan 22 '21 00:01 biostone

This is the issue to track adding rolling updates as a supported strategy in VTop and we should document that - https://github.com/planetscale/vitess-operator/issues/285

Jul 14 '22 06:07 GuptaManan100