Document rolling update process
We need to put up a doc on how rolling updates work with the operator. In the meantime, here is at least a dump of some discussion from Slack:
in general, all edits should be made only to the top-level vt. the others (vtc/vtk/vts) should be treated as read-only since they are actively managed by the top-level vt. this is similar to the pattern used in the core k8s Deployment API - one should not directly edit the underlying ReplicaSets it creates.
to create a new keyspace, you can just edit the vt spec to add it. it should get deployed immediately
however, edits to an already-deployed keyspace will be intentionally held back until the next rollout.
The rollout can be managed with annotations as follows:
whenever there are changes held back, the pending changes will be listed in an annotation on the object called
rollout.planetscale.com/scheduledso for example, when you edit an already-deployed keyspace in the vt, that annotation will appear on the corresponding vtk
the way to say "ok; go ahead and release those scheduled changes on this object" is to add an annotation called
rollout.planetscale.com/releasedwith any string value (even empty string). once this annotation is present, the scheduled changes will be appliedthis needs to be done recursively: after releasing changes to the vtk, the vtk will schedule changes on its vts children which then need to be released; then each vts will schedule changes on the tablet Pods that will need to be released. the idea is that the operator handles actually doing the updates, but the process of deciding which things to update in which order and with what concurrency is left to an orthogonal system
btw, when the
rollout.planetscale.com/releasedannotation is applied to a vttablet Pod with pending changes, the operator will first ensure that tablet is not a master for its shard, doing a PlannedReparentShard automatically if necessary, before recreating the Pod to update its specunless there are no other master-eligible replicas provisioned, in which case there's nothing that can be done except restarting the master. in that case, you must manually delete the master Pod once the
rollout.planetscale.com/scheduledannotation appears on the Pod. the operator will not delete a master Pod.
this needs to be done recursively: after releasing changes to the vtk, the vtk will schedule changes on its vts children which then need to be released; then each vts will schedule changes on the tablet Pods that will need to be released. the idea is that the operator handles actually doing the updates, but the process of deciding which things to update in which order and with what concurrency is left to an orthogonal system
Can you mention how deployments and replicasets (and anything else) falls into this?
Do changes ever get queued in rollout.planetscale.com/scheduled for Services and other objects that are separate from deployments/pods? Or are such changes always applied immediately?
Can you document the usage of rollout.planetscale.com/cascade as well? As well as the cases where one needs to manually remove rollout.planetscale.com/cascade or rollout.planetscale.com/released?
Can you show examples of adding/removing these annotations?
Is VitessClusterUpdateStrategyType: Immediate equivalent to immediately applying rollout.planetscale.com/released / rollout.planetscale.com/cascade recursively on everything? Or is there any other difference?
Are there any kinds of alters that don't go through the rolling update process? (e.g. increasing the number of replicas?)
We really need some official docs on update rollout.
This is the issue to track adding rolling updates as a supported strategy in VTop and we should document that - https://github.com/planetscale/vitess-operator/issues/285