enhancements Add enforcedRollingUpdate strategy to statefulSet

trafficstars

One-line PR description: Add enforcedRollingUpdate strategy to statefulSet

Issue link: https://github.com/kubernetes/enhancements/issues/3541

Other comments:

Sep 28 '22 15:09 kerthcet

/sig apps

Sep 28 '22 15:09 kerthcet

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerthcet Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval by writing /assign @johnbelamaric in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-apps/OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Sep 28 '22 15:09 k8s-ci-robot

cc @smarterclayton do you have time to review this?

Sep 28 '22 15:09 kerthcet

cc @kubernetes/sig-apps-feature-requests

Sep 30 '22 02:09 kerthcet

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 04 '23 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 22 '23 23:02 k8s-triage-robot

/remove-lifecycle rotten

Feb 23 '23 07:02 kerthcet

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 24 '23 08:05 k8s-triage-robot

/remove-lifecycle rotten

May 25 '23 08:05 robert-gdv

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 24 '23 08:06 k8s-triage-robot

/remove-lifecycle rotten

Jun 25 '23 15:06 kerthcet

@soltysh Has any progress been made on this, or any discussions taken place? Right now, I've got stateful sets in a neverending stuck state, until I delete the pods, because there's no way to advance the image if the pods aren't healthy.

Jul 17 '23 15:07 danbopes

any update? why it can't be merged?

Aug 10 '23 00:08 vl-kp

Some updates here to disperse the confusions: This proposal is just inited, as suggested, hope to see this topic been discussed in the bi-weekly meeting in sig-apps to make sure we're in the right way. I'm out of bandwidth right now, so if someone has any interest, plz bring this to the community meeting. Thanks.

Aug 17 '23 02:08 kerthcet

In my testing podManagementPolicy: Parallel completely solves this issue. By default maxUnavilable appears to be 1 so kubernetes restart one pod at a time during updates (true parallel startup/removal during scaling replicas).

Nov 08 '23 20:11 vaskozl

In my testing podManagementPolicy: Parallel completely solves this issue. By default maxUnavilable appears to be 1 so kubernetes restart one pod at a time during updates (true parallel startup/removal during scaling replicas).

Under Parallel mode, yes, also see the description: https://github.com/kubernetes/enhancements/pull/3562/files#diff-1151d1efc62d73a39635cf501e30510a004b6c7e67c09e554a9ad3fd7ca87a81R211-R212

What we want to solve here is sequential rolling-update.

Nov 09 '23 02:11 kerthcet

@vaskozl

Note: The maxUnavailable field is in Alpha stage and it is honored only by API servers that are running with the MaxUnavailableStatefulSet feature gate enabled.

Have you turned the feature gate on before testing?

Nov 14 '23 10:11 okgolove

No, on 1.28 anyway, it seems maxUnavailable defaults to 1? Granted I only tested sts with a few pods and they always restarted one by one.

Nov 15 '23 09:11 vaskozl

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 13 '24 10:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Mar 14 '24 10:03 k8s-triage-robot

/remove-lifecycle rotten

Mar 27 '24 15:03 flomedja

Hi @kerthcet , I would like to know if there's any update for this issue? I've encountered the similar issue in our k8s landscape and it comes to me with surprise that I thought the sts was behaving similarly with deployment. It would be helpful if sts can self recovered from broken state.

Apr 30 '24 03:04 reborn1867

Thanks for the concern @reborn1867 , but not planned for v1.31 as I have other KEPs with higher priority. Sorry for that.

Apr 30 '24 07:04 kerthcet

enhancements enhancements copied to clipboard

Add enforcedRollingUpdate strategy to statefulSet

enhancements
enhancements copied to clipboard