enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Non-graceful node shutdown

Open xing-yang opened this issue 4 years ago • 41 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): Add support to handle non-graceful node shutdown
  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown
  • Discussion Link:
  • Primary contact (assignee): @yastij @xing-yang
  • Responsible SIGs: sig-storage, sig-node
  • Enhancement target (which target equals to which milestone):
    • Alpha release target (x.y): 1.24
    • Beta release target (x.y): 1.25
    • Stable release target (x.y): 1.26

xing-yang avatar Jan 14 '21 14:01 xing-yang

/sig storage

xing-yang avatar Jan 14 '21 14:01 xing-yang

/sig node

xing-yang avatar Jan 14 '21 14:01 xing-yang

Hi @xing-yang, 1.21 enhancements lead here. I see that you’ve opted in this enhancement into 1.21, but I also see that this is tagged with participation from the SIG node. Is that accurate? If so, is there work that SIG node must deliver in 1.21 as well?

annajung avatar Jan 25 '21 15:01 annajung

Hi @annajung - we're still in the process of seeing which changes are needed for sig-node

yastij avatar Jan 25 '21 17:01 yastij

Greetings @xing-yang , This is Joseph v1.21 enhancement shadow following up. For the enhancement to be included in the 1.21 milestone, it must meet the following criteria:

The KEP must be merged in an implementable state The KEP must have test plans The KEP must have graduation criteria The KEP must have a production readiness review

Starting v1.21, all KEPs must include a production readiness review. Please make sure to take a look at the instructions and follow all steps.

Thank you!

jrsapi avatar Feb 05 '21 05:02 jrsapi

Greetings @xing-yang,

Enhancements Freeze is 2 days away, Feb 9th EOD PST

Enhancements team is aware that KEP update is currently in progress (PR #1116). Please make sure to work on PRR questionnaires and requirements and get them merged before the freeze. For PRR related questions or to boost the PR for PRR review, please reach out in slack #prod-readiness

Any enhancements that do not complete the following requirements by the freeze will require an exception.

[IN PROGRESS] The KEP must be merged in an implementable state [IN PROGRESS] The KEP must have test plans [IN PROGRESS] The KEP must have graduation criteria [IN PROGRESS] The KEP must have a production readiness review

jrsapi avatar Feb 08 '21 03:02 jrsapi

Hi @jrsapi, Thanks for the reminder! We still need more discussions to figure out some design issues. So it will not make it in 1.21.

xing-yang avatar Feb 08 '21 22:02 xing-yang

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 09 '21 23:05 fejta-bot

/remove-lifecycle stale

YuikoTakada avatar May 24 '21 01:05 YuikoTakada

Thank you for this issue. it would be better to update this issue's description according to:

We are trying to get KEP merged as "Provisional" and continue with prototyping in 1.22. We want to do more testing before targeting Alpha as this is a complicated problem.

In 1.23, we'll target Alpha.

Thanks!

YuikoTakada avatar May 24 '21 01:05 YuikoTakada

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 22 '21 01:08 k8s-triage-robot

/remove-lifecycle stale

YuikoTakada avatar Aug 26 '21 00:08 YuikoTakada

/milestone v1.23

xing-yang avatar Aug 30 '21 14:08 xing-yang

Hi @yastij, @xing-yang! 1.23 Enhancements team here. Just checking in as we approach enhancements freeze on Thursday 09/09.

If I understand correctly, you are currently targeting alpha for release 1.23.

Here's where this enhancement currently stands:

  • [ ] KEP file using the latest template has been merged into the k/enhancements repo.
  • [ ] KEP status is marked as implementable (marked in PR, not merged)
  • [ ] KEP has a test plan section filled out. (filled out in PR, not merged)
  • [ ] KEP has up to date graduation criteria. (filled out in PR, not merged)
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements. (filled out in PR, not merged)

Starting with 1.23, we have implemented a soft freeze on production readiness reviews beginning on Thursday 09/02. It looks like you already have yours filled out and have a PRR reviewer selected, so you are on track for the soft freeze!

Thanks!

lauralorenz avatar Sep 01 '21 20:09 lauralorenz

Hi @lauralorenz, I have removed this from 1.23 milestone. Please help remove it from the enhancement tracking sheet. Thanks.

xing-yang avatar Sep 07 '21 22:09 xing-yang

Hi @xing-yang, got it! I've updated the enhancement tracking sheet.

lauralorenz avatar Sep 07 '21 22:09 lauralorenz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 06 '21 23:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 05 '22 23:01 k8s-triage-robot

/remove-lifecycle rotten

xing-yang avatar Jan 11 '22 03:01 xing-yang

/milestone v1.24

xing-yang avatar Jan 11 '22 03:01 xing-yang

Hello @yastij, @xing-yang! 👋, 1.24 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00pm PT on Thursday Feb 3rd, 2022.

This enhancement is targeting for stage beta for 1.24, is that correct?

Here’s where this enhancement currently stands:

  • [x] Updated KEP file using the latest template has been merged into the k/enhancements repo.
  • [x] KEP status is marked as implementable for this release
  • [x] KEP has a test plan section filled out.
  • [x] KEP has up to date graduation criteria.
  • [x] KEP has a production readiness review that has been completed and merged into k/enhancements.

Looks like for this one, we would need to update the following:

  • confirm whether this enhancement is graduating to stage alpha or stage beta?
  • update the open KEP PR https://github.com/kubernetes/enhancements/pull/1116:
    • to reflect the status as implementable in the kep.yaml file
    • (if graduating to stage beta) to reflect the current stage to beta in the kep.yaml file
    • (if graduating to stage beta), this enhancement require a production readiness review (PRR) approval for stage beta.

At the moment, the status of this enhancement is marked as at-risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

Priyankasaggu11929 avatar Jan 17 '22 07:01 Priyankasaggu11929

With #1116 merged, I've marked this as tracked for enhancement freeze.

rhockenbury avatar Feb 03 '22 22:02 rhockenbury

Hi @yastij, @xing-yang, & @rhockenbury, 1.24 Docs shadow here. 👋

This enhancement is marked as Needs Docs for the 1.24 release.

Please follow the steps detailed in the documentation to open a PR against the dev-1.24 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 31st March 2022, 18:00 PDT.

Also, if needed take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you! 🙌

didicodes avatar Feb 13 '22 19:02 didicodes

I think I've spotted a design defect. https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown suggests using the private out-of-service taint; however, private taints are reserved for cluster operators and Kubernetes should not infer a specific meaning to any private taint.

We should instead use a registered taint.

sftim avatar Mar 04 '22 17:03 sftim

I mentioned the same concern in https://github.com/kubernetes/kubernetes/pull/108486#discussion_r819752216

sftim avatar Mar 04 '22 17:03 sftim

Hi @yastij and @xing-yang

I'm checking in as we approach 1.24 code freeze at 01:00 UTC Wednesday 30th March 2022.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

For this KEP, it looks like just k/k#108486 needs to be merged. Are there any other PRs that you think we should be tracking that would be subject to the 1.24 code freeze?

Let me know if you have any questions.

rhockenbury avatar Mar 22 '22 02:03 rhockenbury

Hey @rhockenbury -- There is a doc PR that I raised for this feature. https://github.com/kubernetes/website/pull/32406

sonasingh46 avatar Mar 22 '22 07:03 sonasingh46

@rhockenbury, implementation PR is merged. We are good for code freeze. Thanks.

xing-yang avatar Mar 29 '22 01:03 xing-yang

/milestone v1.25

xing-yang avatar May 23 '22 21:05 xing-yang

Hello @xing-yang 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 16, 2022.

For note, This enhancement is targeting for stage beta for 1.25 release

Here's where this enhancement currently stands:

  • [ ] KEP file using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable
  • [ ] KEP has an updated detailed test plan section filled out
  • [X] KEP has up to date graduation criteria
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements.

It looks like for this one, we would need to update the following:

Open PR https://github.com/kubernetes/enhancements/pull/3320

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

Atharva-Shinde avatar Jun 06 '22 18:06 Atharva-Shinde