enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Retriable and non-retriable Pod failures for Jobs

Open alculquicondor opened this issue 2 years ago • 97 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): An API to influence retries based on exit codes and/or pod deletion reasons.

  • Kubernetes Enhancement Proposal: https://git.k8s.io/enhancements/keps/sig-apps/3329-retriable-and-non-retriable-failures

  • Discussion Link: https://github.com/kubernetes/kubernetes/issues/17244

  • Primary contact (assignee): @alculquicondor

  • Responsible SIGs: apps, api-machinery, scheduling

  • Enhancement target (which target equals to which milestone):

    • Alpha release target (x.y): 1.25
    • Beta release target (x.y): 1.26
    • Stable release target (x.y): 1.31
  • [x] Alpha

    • [x] KEP (k/enhancements) update PR(s):
      • #3374
      • #3438
      • #3447
      • #3452
    • [x] Code (k/k) update PR(s):
      • kubernetes/kubernetes#111070
      • kubernetes/kubernetes#111084
      • kubernetes/kubernetes#111091
      • kubernetes/kubernetes#110959
      • kubernetes/kubernetes#111113
      • kubernetes/kubernetes#111475
    • [x] Docs (k/website) update PR(s): kubernetes/website#35219
  • [x] Beta

    • [x] KEP (k/enhancements) update PR(s):
      • #3463
      • #3646
      • #3769
      • #3757
      • v1.28
        • https://github.com/kubernetes/enhancements/pull/3965
        • https://github.com/kubernetes/enhancements/pull/3940
      • v1.30
        • https://github.com/kubernetes/enhancements/pull/4442
    • [x] Code (k/k) update PR(s):
      • https://github.com/kubernetes/kubernetes/pull/112360
      • https://github.com/kubernetes/kubernetes/pull/113324
      • https://github.com/kubernetes/kubernetes/pull/113304
      • https://github.com/kubernetes/kubernetes/pull/113360
      • https://github.com/kubernetes/kubernetes/pull/113812
      • https://github.com/kubernetes/kubernetes/pull/113580
      • https://github.com/kubernetes/kubernetes/pull/113856
      • https://github.com/kubernetes/kubernetes/pull/113860
      • https://github.com/kubernetes/kubernetes/pull/113927
      • kubernetes/kubernetes#114770
      • kubernetes/kubernetes#114914
      • kubernetes/kubernetes#115056
      • https://github.com/kubernetes/kubernetes/pull/115331
      • https://github.com/kubernetes/kubernetes/pull/116554
      • https://github.com/kubernetes/kubernetes/pull/117586
      • https://github.com/kubernetes/kubernetes/pull/117015
      • https://github.com/kubernetes/kubernetes/pull/121103
    • [x] Docs (k/website) update(s):
      • https://github.com/kubernetes/website/pull/37242
      • https://github.com/kubernetes/website/pull/38040
      • https://github.com/kubernetes/website/pull/38042
      • https://github.com/kubernetes/website/pull/39809
      • https://github.com/kubernetes/website/pull/41745
  • [ ] Stable

    • [x] KEP (k/enhancements) update PR(s): #4661
    • [ ] Code (k/k) update PR(s):
      • https://github.com/kubernetes/kubernetes/pull/125533
      • https://github.com/kubernetes/kubernetes/pull/125442
      • https://github.com/kubernetes/kubernetes/pull/125461
      • https://github.com/kubernetes/kubernetes/pull/125482
    • [ ] Docs (k/website) update(s): https://github.com/kubernetes/website/pull/46807

alculquicondor avatar Jun 01 '22 17:06 alculquicondor

/sig apps /wg batch

alculquicondor avatar Jun 01 '22 17:06 alculquicondor

/assign

alculquicondor avatar Jun 01 '22 17:06 alculquicondor

/assign

mimowo avatar Jun 02 '22 07:06 mimowo

/sig scheduling /sig api-machinery

alculquicondor avatar Jun 09 '22 15:06 alculquicondor

Hello @alculquicondor 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PT on Thursday June 23, 2022, which is just over 2 days from now.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [ ] KEP file using the latest template has been merged into the k/enhancements repo.
  • [ ] KEP status is marked as implementable
  • [ ] KEP has a updated detailed test plan section filled out
  • [ ] KEP has up to date graduation criteria
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements.

The open PR https://github.com/kubernetes/enhancements/pull/3374 is addressing all the listed criteria above. We would just require getting it merged by the Enhancements Freeze.

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

Priyankasaggu11929 avatar Jun 22 '22 05:06 Priyankasaggu11929

With KEP PR #3374 merged, the enhancement is ready for the 1.25 Enhancements Freeze.

For note, the status is now marked as tracked. Thank you so much!

Priyankasaggu11929 avatar Jun 23 '22 15:06 Priyankasaggu11929

Hello @alculquicondor 👋, 1.25 Release Docs Lead here. This enhancement is marked as ‘Needs Docs’ for 1.25 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by August 4.
 Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. 
Thank you!

kcmartin avatar Jul 12 '22 23:07 kcmartin

Hi @alculquicondor @mimowo, Enhancements team here again 👋

Checking in as we approach Code Freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure that the following items are completed before the code-freeze:

  • [X] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes). https://github.com/kubernetes/kubernetes/pull/111070 https://github.com/kubernetes/kubernetes/pull/111084 https://github.com/kubernetes/kubernetes/pull/111091 https://github.com/kubernetes/kubernetes/pull/110959 https://github.com/kubernetes/kubernetes/pull/111113 https://github.com/kubernetes/kubernetes/pull/111475
  • [ ] All PRs are fully merged by the code freeze deadline. https://github.com/kubernetes/kubernetes/pull/110959 https://github.com/kubernetes/kubernetes/pull/111113 https://github.com/kubernetes/kubernetes/pull/111475

Let me know if there are any additional k/k PRs besides the ones listed above

Currently, the status of the enhancement is marked as at-risk

Thanks :)

Atharva-Shinde avatar Jul 25 '22 15:07 Atharva-Shinde

@Atharva-Shinde @alculquicondor there is one more PR that should be included before the code freeze: https://github.com/kubernetes/kubernetes/pull/111475

mimowo avatar Jul 29 '22 14:07 mimowo

thank you @mimowo, I have updated my comment with the PR and have also tagged you for future reference :)

Atharva-Shinde avatar Jul 29 '22 16:07 Atharva-Shinde

Hey @alculquicondor @mimowo, reaching out again as we approach Code Freeze at 01:00 UTC on this Wednesday i.e 3rd August 2022. Try to get all the open Code(k/k) PRs mentioned in the Issue Description merged before the code-freeze :) https://github.com/kubernetes/kubernetes/pull/110959 https://github.com/kubernetes/kubernetes/pull/111113 https://github.com/kubernetes/kubernetes/pull/111475

The status of the enhancement is still marked as at-risk

Atharva-Shinde avatar Aug 01 '22 15:08 Atharva-Shinde

Hello :wave:, 1.25 Enhancements Lead here.

Following discussion in Slack the release team is APPROVING this exception request. Your updated deadline to make any changes to your PR is 9:30 AM PST Monday 8th August 2022

https://groups.google.com/u/0/g/kubernetes-sig-release/c/EBdL_-Jhv_s

Thank you!

Priyankasaggu11929 avatar Aug 03 '22 01:08 Priyankasaggu11929

All the implementation PRs are merged :smiley:

IIUC, we are still making it to the beta release?

alculquicondor avatar Aug 04 '22 20:08 alculquicondor

Yes, the enhancement is marked as tracked for 1.25 release cycle. Thank you so much! 🙂

Priyankasaggu11929 avatar Aug 05 '22 03:08 Priyankasaggu11929

/sig node

alculquicondor avatar Sep 20 '22 17:09 alculquicondor

/milestone v1.26 /stage beta /label lead-opted-in

soltysh avatar Sep 22 '22 10:09 soltysh

/label tracked/yes /remove-label tracked/no

rhockenbury avatar Sep 22 '22 16:09 rhockenbury

Hey @alculquicondor @mimowo 👋, 1.26 Enhancements team here!

Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.

This enhancement is targeting for stage beta for 1.26

Here's where this enhancement currently stands:

  • [X] KEP file using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable
  • [X] KEP has an updated detailed test plan section filled out
  • [X] KEP has up to date graduation criteria
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to:

  • Update the kep.yaml to reflect the current milestone information
  • Update the production readiness review with latest stage information
  • Include the new updated PR of this KEP in the Issue Description and get it merged before Enhancements Freeze to make this enhancement eligible for 1.26 release.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you :)

Atharva-Shinde avatar Sep 24 '22 13:09 Atharva-Shinde

@Atharva-Shinde the enhancement is targeting Beta for 1.26. This is the KEP update which is currently under review: https://github.com/kubernetes/enhancements/pull/3463.

mimowo avatar Sep 26 '22 08:09 mimowo

Thanks @mimowo I've updated my comment :)

Atharva-Shinde avatar Sep 26 '22 11:09 Atharva-Shinde

/milestone v1.26 /label lead-opted-in

(For sig-node, we see this is not attempting to derive any intelligence from kubelet/runtime initiated conditions)

derekwaynecarr avatar Oct 03 '22 20:10 derekwaynecarr

Hello @alculquicondor @mimowo 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.

Please plan to get the action items mentioned in my comment above done before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow

For note, the current status of the enhancement is marked at-risk :)

Atharva-Shinde avatar Oct 05 '22 16:10 Atharva-Shinde

@Atharva-Shinde the PRR has been approved at the correct beta level in https://github.com/kubernetes/enhancements/pull/3463/files so not quite sure what else do you expect?

soltysh avatar Oct 05 '22 16:10 soltysh

Thanks @soltysh for bringing this to my notice (not sure how I missed this sorry for the error), everything is up-to-date! I've updated the KEP status to tracked for 1.26 release cycle :)

Atharva-Shinde avatar Oct 05 '22 17:10 Atharva-Shinde

Hi @alculquicondor and @mimowo 👋,

Checking in once more as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

  • [ ] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [ ] All PRs are fully merged by the code freeze deadline.

For this enhancement, please plan to get PRs out for all k/k code so it can be merged up by code freeze. If you do have k/k PRs open, please link them to this issue. Let me know if there aren't any further PRs that need to be created or merged for this enhancements, so that I can mark it as tracked for code freeze.

As always, we are here to help should questions come up. Thanks!

parul5sahoo avatar Nov 01 '22 14:11 parul5sahoo

Hello @alculquicondor and @mimowo 👋 1.26 Release Docs shadow here!

This enhancement is marked as ‘Needs Docs’ for 1.26 release. Please follow the steps detailed in the documentation to open a PR against dev-1.26 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by November 9. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you!

cathchu avatar Nov 02 '22 03:11 cathchu

The placeholder PR is prepared: https://github.com/kubernetes/website/pull/37242. @alculquicondor please reference it in the Issue description.

mimowo avatar Nov 02 '22 08:11 mimowo

Hey @mimowo and @alculquicondor ,

As the Code freeze is just a day away, just wanted to confirm that there are no open PRs in the K/K repo or any repo in general for this enhancement other than the ones outlined in the issue description? Please get the open PRs merged before the code freeze, so that the enhancement can be marked tracked.

parul5sahoo avatar Nov 07 '22 05:11 parul5sahoo

There is one more k/e PR with a purpose to align the KEP with the decisions taken during the implementation phase. Not sure if it should be blocking for the Code Freeze. Anyway, could you @alculquicondor please add the KEP update to the list of PRs and review / approve.

Hey @mimowo and @alculquicondor ,

As the Code freeze is just a day away, just wanted to confirm that there are no open PRs in the K/K repo or any repo in general for this enhancement other than the ones outlined in the issue description? Please get the open PRs merged before the code freeze, so that the enhancement can be marked tracked.

mimowo avatar Nov 07 '22 09:11 mimowo

We have this marked as tracked for code freeze.

rhockenbury avatar Nov 09 '22 00:11 rhockenbury