operator icon indicating copy to clipboard operation
operator copied to clipboard

vmcluster maxUnavailable clashes with PDB

Open thejuan opened this issue 1 month ago • 1 comments

As part of the introduction of https://github.com/VictoriaMetrics/operator/pull/1458, evict was chosen over delete explicitly to honour PDBs.

This causes issues and confusion when distinct configuration is desired for distinct use cases i.e. a low maxUnavailable PDB control normal operation voluntary disruption like Karpenter and node draining. A high maxUnavailable is used to control the desired deployment rate (usually in conjuction with a multi AZ setup). In this case PDB will win and prevents a fast roll-out.

Kubernetes 1.35 is bringing native maxUnavailable for statefulsets which if integrated here will solve the issue. It could be a long time coming for users to get to that version.

I'd like to float the idea of making the use of evict/delete configurable when maxUnavailable is set

thejuan avatar Nov 26 '25 00:11 thejuan

The goal for this potential change is to speed up vmstorage rollout - by default it uses graceful shutdown, which means an update rollout takes nodes one by one and thus takes quite a lot of time.

The alternative is described in https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#updating--reconfiguring-cluster-nodes - if we accept that read (not write!) disruption is acceptable for a short period of time, then we should update the operator to evict pod simultaneously. Instead of waiting for instances to update gracefully new instances are created in the shortest amount of time possible, so that we wouldn't have to wait for graceful shutdown and let the vmagents accumulate the queue. This avoids rows rerouting but has a significant drawback - read path would be disrupted until vmstorages are back (thus the focus on minimizing their downtime).

If this solution is acceptable, we don't need PDBs or maxUnavailable changes at all - the operator can handle this change on any configuration and doesn't need kes k8s features.

vrutkovs avatar Dec 03 '25 15:12 vrutkovs