enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Add enforcedRollingUpdate strategy to statefulSet

Open kerthcet opened this issue 3 years ago • 23 comments
trafficstars

  • One-line PR description: Add enforcedRollingUpdate strategy to statefulSet
  • Issue link: https://github.com/kubernetes/enhancements/issues/3541
  • Other comments:

kerthcet avatar Sep 28 '22 15:09 kerthcet

/sig apps

kerthcet avatar Sep 28 '22 15:09 kerthcet

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerthcet Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric for approval by writing /assign @johnbelamaric in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Sep 28 '22 15:09 k8s-ci-robot

cc @smarterclayton do you have time to review this?

kerthcet avatar Sep 28 '22 15:09 kerthcet

cc @kubernetes/sig-apps-feature-requests

kerthcet avatar Sep 30 '22 02:09 kerthcet

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 04 '23 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 22 '23 23:02 k8s-triage-robot

/remove-lifecycle rotten

kerthcet avatar Feb 23 '23 07:02 kerthcet

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 24 '23 08:05 k8s-triage-robot

/remove-lifecycle rotten

robert-gdv avatar May 25 '23 08:05 robert-gdv

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 24 '23 08:06 k8s-triage-robot

/remove-lifecycle rotten

kerthcet avatar Jun 25 '23 15:06 kerthcet

@soltysh Has any progress been made on this, or any discussions taken place? Right now, I've got stateful sets in a neverending stuck state, until I delete the pods, because there's no way to advance the image if the pods aren't healthy.

danbopes avatar Jul 17 '23 15:07 danbopes

any update? why it can't be merged?

vl-kp avatar Aug 10 '23 00:08 vl-kp

Some updates here to disperse the confusions: This proposal is just inited, as suggested, hope to see this topic been discussed in the bi-weekly meeting in sig-apps to make sure we're in the right way. I'm out of bandwidth right now, so if someone has any interest, plz bring this to the community meeting. Thanks.

kerthcet avatar Aug 17 '23 02:08 kerthcet

In my testing podManagementPolicy: Parallel completely solves this issue. By default maxUnavilable appears to be 1 so kubernetes restart one pod at a time during updates (true parallel startup/removal during scaling replicas).

vaskozl avatar Nov 08 '23 20:11 vaskozl

In my testing podManagementPolicy: Parallel completely solves this issue. By default maxUnavilable appears to be 1 so kubernetes restart one pod at a time during updates (true parallel startup/removal during scaling replicas).

Under Parallel mode, yes, also see the description: https://github.com/kubernetes/enhancements/pull/3562/files#diff-1151d1efc62d73a39635cf501e30510a004b6c7e67c09e554a9ad3fd7ca87a81R211-R212

What we want to solve here is sequential rolling-update.

kerthcet avatar Nov 09 '23 02:11 kerthcet

@vaskozl

Note: The maxUnavailable field is in Alpha stage and it is honored only by API servers that are running with the MaxUnavailableStatefulSet feature gate enabled.

Have you turned the feature gate on before testing?

okgolove avatar Nov 14 '23 10:11 okgolove

No, on 1.28 anyway, it seems maxUnavailable defaults to 1? Granted I only tested sts with a few pods and they always restarted one by one.

vaskozl avatar Nov 15 '23 09:11 vaskozl

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 13 '24 10:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 14 '24 10:03 k8s-triage-robot

/remove-lifecycle rotten

flomedja avatar Mar 27 '24 15:03 flomedja

Hi @kerthcet , I would like to know if there's any update for this issue? I've encountered the similar issue in our k8s landscape and it comes to me with surprise that I thought the sts was behaving similarly with deployment. It would be helpful if sts can self recovered from broken state.

reborn1867 avatar Apr 30 '24 03:04 reborn1867

Thanks for the concern @reborn1867 , but not planned for v1.31 as I have other KEPs with higher priority. Sorry for that.

kerthcet avatar Apr 30 '24 07:04 kerthcet