operator icon indicating copy to clipboard operation
operator copied to clipboard

Statefulset OnDelete Limitation and Enhancement

Open mohammadkhavari opened this issue 2 years ago • 4 comments

While using vmcluster, rolling and updating vmstorage and vmselect we can't actually use OnDelete policy, cause when updateStrategy is OnDelete, victoriametrics operator will delete sts pods accordingly, this feature was added due to this issue. #344 the problem here is when we apply a minor change in vmstorage configurations operator will delete vmstorage pods in order and vminserts will start rerouting, all storage nodes will face pending indexdb items as soon as new timeseries arrive, and this leads to metrics loss in the pipeline, our vmcluster can tolerate one storage node downtime without metric loss, but multiple storage node downtime is troublesome. Is there any way that we can actually use OnDelete Strategy on vmstorage statefulsets, or even a parameter like stsPodsRollingDelay can be save us from downtime, a time that operator waits between recreating each vmstorage pod. In fact we can turn off the operator and update statefulset and vmcluster objects and delete pods accordingly, but if we make a mistake in consistency between our manually edited sts and the sts that vmcluster will generate (that cause different revisions), by activating operator pods may recreated again.

mohammadkhavari avatar Sep 03 '23 12:09 mohammadkhavari

Hello! That's a very good case to consider!

Is there any way that we can actually use OnDelete Strategy on vmstorage statefulsets

From the commit, operator will take over the rolling update process if rollingUpdateStrategy=OnDelete, that should cover some corner cases, so I don't think it's recommended to use the real OnDelete Strategy.

or even a parameter like stsPodsRollingDelay can be save us from downtime, a time that operator waits between recreating each vmstorage pod.

Since both OnDelete and RollingUpdate wait pod to be ready to delete the next one, I'm thinking you can increase successThreshold or periodSeconds under vmstorage's readinessProbe, that should prolong the pod ready duration and let vmstorage cluster to be more stable. For example

   readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /health
        port: 8482
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 6 // under this setting, vmstorage pod will wait at least 60s to become ready, but it will start working before that 
      timeoutSeconds: 5

Haleygo avatar Sep 13 '23 02:09 Haleygo

I think, it's a global vmcluster issue. Changing rolling update strategy or increasing delays between component updates don't help much in this case.

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4922

f41gh7 avatar Sep 26 '23 08:09 f41gh7

Hey @mohammadkhavari, Starting from v1.95.0, you can set -storage.vminsertConnsShutdownDuration command-line flag at vmstorage to gradual close vminsert connections during vmstorage graceful shutdown, see these docs for more details.

Haleygo avatar Nov 22 '23 02:11 Haleygo

storage.vminsertConnsShutdownDuration has no effect (when longer than 25s) when using OnDelete strategy because the operator has a hard-coded 30s timeout: https://github.com/VictoriaMetrics/operator/blob/e261c37dc973154ff71073d7213000421416bd4e/controllers/factory/k8stools/sts.go#L194

Instead, the deletion grace period should probably use TerminationGracePeriodSeconds instead. Since a user will need to properly set that too.

dctrwatson avatar Feb 16 '24 06:02 dctrwatson