enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Backoff Limit Per Index For Indexed Jobs

Open jensentanlo opened this issue 2 years ago β€’ 31 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): Add New Indexed Job backoff limit mode that is counted per index rather than per job

  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs

  • Discussion Link: https://github.com/kubernetes/kubernetes/issues/109712

  • Primary contact (assignee): mimowo

  • Responsible SIGs: apps

  • Enhancement target (which target equals to which milestone):

    • Alpha release target (x.y): 1.28
    • Beta release target (x.y): 1.29
    • Stable release target (x.y):
  • [x] Alpha

    • [x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/3967
    • [x] Code (k/k) update PR(s):
      • https://github.com/kubernetes/kubernetes/pull/118009
      • https://github.com/kubernetes/kubernetes/pull/119294
    • [x] Docs (k/website) update PR(s): https://github.com/kubernetes/website/pull/41921
  • [x] Beta

    • [x] KEP (k/enhancements) update PR(s):
      • https://github.com/kubernetes/enhancements/pull/4228
      • https://github.com/kubernetes/enhancements/pull/4321
    • [x] Code (k/k) update PR(s):
      • https://github.com/kubernetes/kubernetes/pull/121356
      • https://github.com/kubernetes/kubernetes/pull/121292
      • https://github.com/kubernetes/kubernetes/pull/121471
      • https://github.com/kubernetes/kubernetes/pull/121368
      • https://github.com/kubernetes/kubernetes/pull/121633
      • https://github.com/kubernetes/kubernetes/pull/121393
    • [x] Docs (k/website) update(s): https://github.com/kubernetes/website/pull/43388

jensentanlo avatar Feb 07 '23 13:02 jensentanlo

/sig apps /wg batch

jensentanlo avatar Feb 07 '23 13:02 jensentanlo

/assign @mimowo

alculquicondor avatar Apr 24 '23 16:04 alculquicondor

In addition to configuring the backoff per index, we should probably have FailIndex as one of the actions for pod failure policies.

alculquicondor avatar Apr 24 '23 17:04 alculquicondor

/milestone v1.28 /stage alpha /label lead-opted-in

soltysh avatar May 30 '23 11:05 soltysh

Hello @mimowo πŸ‘‹, Enhancements team here.

Just checking in as we approach enhancements freeze on 01:00 UTC Friday, 16th June 2023.

This enhancement is targeting for stage alpha for 1.28 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [ ] KEP readme using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable for latest-milestone: 1.28
  • [x] KEP readme has a updated detailed test plan section filled out
  • [x] KEP readme has up to date graduation criteria
  • [x] KEP has a production readiness review that has been completed and merged into k/enhancements.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

aramase avatar Jun 14 '23 05:06 aramase

@aramase I think the first point is addressed as the KEP has been merged: https://github.com/kubernetes/enhancements/pull/3967.

mimowo avatar Jun 14 '23 05:06 mimowo

@aramase is there anything missing to make it tracked?

mimowo avatar Jun 15 '23 05:06 mimowo

Hey @mimowo With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. πŸš€

The status of this enhancement is marked as tracked. Please keep the issue description up-to-date with appropriate stages as well. Thank you :)

Atharva-Shinde avatar Jun 15 '23 11:06 Atharva-Shinde

Hello @mimowo :wave:, 1.28 Docs Lead here.

Does this enhancement work planned for 1.28 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.28 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday 20th July 2023.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

Rishit-dagli avatar Jul 01 '23 09:07 Rishit-dagli

Hey again @mimowo :wave:

Just checking in as we approach Code freeze at 01:00 UTC Friday, 19th July 2023 .

Here’s the enhancement’s state for the upcoming code freeze:

  • [ ] All the PRs that are related to your enhancement are linked in the above issue description (for tracking purposes). This includes code, tests, and documentation related PR/s.
  • [ ] All code related PR/s are merged or are in merge-ready state ( i.e they have approved and lgtm labels applied) by the code freeze deadline. This includes any tests related PR/s too.

I see https://github.com/kubernetes/kubernetes/pull/118009 PR in the issue description. If there are any other k/k related PR(s) that we should be tracking for this KEP please link them in the issue description above.

As always, we are here to help if any questions come up. Thanks!

aramase avatar Jul 17 '23 23:07 aramase

Hey @mimowo πŸ‘‹ Enhancements Lead here, With https://github.com/kubernetes/kubernetes/pull/118009 and https://github.com/kubernetes/kubernetes/pull/119294 merged as per the issue description, this enhancement is now tracked for v1.28 Code Freeze!

Atharva-Shinde avatar Jul 19 '23 03:07 Atharva-Shinde

/remove-label lead-opted-in

npolshakova avatar Aug 27 '23 22:08 npolshakova

Hello @mimowo, 1.29 Enhancements team here! Is this enhancement targeting 1.29? If it is, can you follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board? Thanks!

npolshakova avatar Sep 26 '23 14:09 npolshakova

/assign @soltysh

mimowo avatar Sep 27 '23 08:09 mimowo

/milestone v1.29 /label lead-opted-in

soltysh avatar Sep 27 '23 11:09 soltysh

Hello @mimowo πŸ‘‹, 1.29 Enhancements team here!

Just checking in as we approach enhancements freeze on 01:00 UTC, Friday, 6th October, 2023.

This enhancement is targeting for stage beta for 1.29 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • [ ] KEP readme using the latest template has been merged into the k/enhancements repo.
  • [x] KEP status is marked as implementable for latest-milestone: 1.29. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • [x] KEP readme has up-to-date graduation criteria
  • [ ] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here).

For this KEP, it looks like merging of #4228 will address will these issues.

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well.

Thank you!

sanchita-07 avatar Sep 29 '23 04:09 sanchita-07

Hi @mimowo, with all the requirements for this KEP in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. πŸš€ The status of this enhancement is marked as tracked for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

sanchita-07 avatar Oct 04 '23 09:10 sanchita-07

Hey there @mimowo and @soltysh :wave:, v1.29 Docs Lead here. Does this enhancement work planned for v1.29 require any new docs or modification to existing docs? If so, please follows the steps here to open a PR against dev-1.29 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday, 19 October 2023. Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release. Thank you!

katcosgrove avatar Oct 09 '23 07:10 katcosgrove

Hi @jensentanlo :wave: from the v1.29 Communications Release Team! We would like to check if you have any plans to publish blogs for this KEP regarding new features, removals, and deprecations for this release. If so, you need to open a PR placeholder in the website repository. The deadline will be on Tuesday 14th November 2023 (after the Docs deadline PR ready for review) Here's the 1.29 Calendar

James-Quigley avatar Oct 23 '23 14:10 James-Quigley

Hey again @mimowo πŸ‘‹, 1.29 Enhancements team here.

Just checking in as we approach code freeze at 01:00 UTC Wednesday 1st November 2023:

Here's where this enhancement currently stands:

  • [x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [x] All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

For this enhancement, it looks like the following PR was merged before code freeze:

  • https://github.com/kubernetes/kubernetes/pull/118009
  • https://github.com/kubernetes/kubernetes/pull/119294

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

With the merged and linked in the issue description, this KEP is tracked for code freeze for v1.29. πŸš€ As always, we are here to help if any questions come up ✌. Thanks :)

sanchita-07 avatar Oct 24 '23 15:10 sanchita-07

Also, please let me know if there are other PRs in k/k we should be tracking for this KEP.

Yes, there are two other PRs (both added to the description now):

  • https://github.com/kubernetes/kubernetes/pull/121471
  • https://github.com/kubernetes/kubernetes/pull/121368 (lgtm & approved)

mimowo avatar Oct 25 '23 09:10 mimowo

Thanks @mimowo for mentioning them! Since both the PRs have lgtm and approved level and are linked in the issue description we are good to go. :smiley: :rocket:

sanchita-07 avatar Oct 25 '23 11:10 sanchita-07

I performed some manual testing on this feature and saw everything working as expected, a short summary of the details are below if you're interested.


I ran on a local kind cluster (1.28) with alpha feature gate enabled, indexed jobs with completions = 1000, mainly checking whether:

  1. All pods ran to completion or failure
  2. All failed indices are correctly recorded on the job object

Related to indexed jobs in general but not this specific feature, I was also interested in the delete behavior, because I've had trouble with bulk deletions of non-indexed jobs in the past, but it looks like everything was correctly cleaned up relatively quickly, even though I was churning through a couple indexed jobs (so thousands of pods) on my local machine.

jensentanlo avatar Nov 06 '23 14:11 jensentanlo

/remove-label lead-opted-in

salehsedghpour avatar Jan 06 '24 16:01 salehsedghpour

Hello πŸ‘‹ 1.30 Enhancements Lead here,

I'm closing milestone 1.29 now, If you wish to progress this enhancement in v1.30, please follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board and finally add /milestone v1.30. Thanks!

/milestone clear

salehsedghpour avatar Jan 16 '24 23:01 salehsedghpour

@mimowo if I am not mistaken, this feature should have a stage of beta.

It turns out I can update the label!

kannon92 avatar Feb 01 '24 18:02 kannon92

/stage beta

kannon92 avatar Feb 01 '24 18:02 kannon92

Hi @soltysh, @mimowo, and @kannon92 , Enhancements Team here! Just wondering, if you are aiming to have this Enhancement in 1.30. If yes, please follow the instructions here to opt in the enhancement and make sure the lead-opted-in label is set so it can get added to the tracking board and finally add /milestone v1.30. Thanks!

salehsedghpour avatar Feb 01 '24 22:02 salehsedghpour

No plans to graduate in this release.

alculquicondor avatar Feb 01 '24 22:02 alculquicondor

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 01 '24 23:05 k8s-triage-robot