enhancements icon indicating copy to clipboard operation
enhancements copied to clipboard

Resilient Watchcache Initialization

Open wojtek-t opened this issue 1 year ago • 28 comments

Enhancement Description

  • One-line enhancement description (can be used as a release note): Resilient WatchCache initialization

  • Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4568-resilient-watchcache-initialization

  • Discussion Link:

  • Primary contact (assignee): @wojtek-t

  • Responsible SIGs: sig-api-machinery, sig-scalability

  • Enhancement target (which target equals to which milestone):

    • Beta release target 1.31:
    • Stable release target (x.y):
  • [ ] Beta

    • [x] KEP (k/enhancements) update PR(s): https://github.com/kubernetes/enhancements/pull/4557/
    • [x] Code (k/k) update PR(s): https://github.com/kubernetes/kubernetes/pull/124642
    • [ ] Docs (k/website) update(s): https://github.com/kubernetes/website/pull/47002

/sig api-machinery /sig scalability /milestone v1.31

wojtek-t avatar Apr 04 '24 11:04 wojtek-t

/label lead-opted-in

wojtek-t avatar May 29 '24 14:05 wojtek-t

Hello @wojtek-t 👋, 1.31 Enhancements team here.

Just checking in as we approach enhancements freeze on on 02:00 UTC Friday 14th June 2024 / 19:00 PDT Thursday 13th June 2024.

This enhancement is targeting for stage beta for 1.31 (correct me, if otherwise) /stage beta

Here's where this enhancement currently stands:

  • [x] KEP readme using the latest template has been merged into the k/enhancements repo.
  • [x] KEP status is marked as implementable for latest-milestone: v1.31. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • [x] KEP readme has up-to-date graduation criteria
  • [x] KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline so that the PRR team has enough time to review your KEP.

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

The status of this enhancement is marked as tracked for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

dipesh-rawat avatar Jun 05 '24 00:06 dipesh-rawat

Hi @wojtek-t 👋, 1.31 Docs Shadow here.

Does this enhancement work planned for 1.31 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.31 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday June 27, 2024 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarized with the docs requirement for the release.

Thank you!

Daniel

chanieljdan avatar Jun 11 '24 12:06 chanieljdan

Hi @wojtek-t

:wave: from the v1.31 Communications Team! We'd love for you to opt in to write a feature blog about your enhancement! Some reasons why you might want to write a blog for this feature include (but are not limited to) if this introduces breaking changes, is important to our users, or has been in progress for a long time and is graduating.

To opt in, let us know and open a Feature Blog placeholder PR against the website repository by 3rd July, 2024. For more information about writing a blog see the blog contribution guidelines.

Note: In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

hailkomputer avatar Jun 19 '24 11:06 hailkomputer

Hi @wojtek-t 👋, 1.31 Docs Shadow here.

Does this enhancement work planned for 1.31 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.31 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday June 27, 2024 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarized with the docs requirement for the release.

Thank you!

Daniel

Hi @wojtek-t 👋,

Just a reminder to open a placeholder PR against dev-1.31 branch in the k/website repo for this (steps available here). The deadline for this is a week away at Thursday June 27, 2024 18:00 PDT.

Thanks,

Daniel

chanieljdan avatar Jun 20 '24 19:06 chanieljdan

Hi @wojtek-t 👋, 1.31 Docs Shadow here.

Does this enhancement work planned for 1.31 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.31 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday June 27, 2024 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarized with the docs requirement for the release.

Thank you!

Daniel

Hi @wojtek-t 👋,

Just a reminder to open a placeholder PR against dev-1.31 branch in the k/website repo for this (steps available here) if it requires docs. The deadline for this is tomorrow at Thursday June 27, 2024 18:00 PDT.

Thanks,

Daniel

chanieljdan avatar Jun 26 '24 13:06 chanieljdan

@wojtek-t , friendly reminder about the upcoming blog opt-in and placeholder deadline on July 3rd. Please open a blog placeholder PR if you are interested in contributing a blog.

hailkomputer avatar Jun 28 '24 09:06 hailkomputer

Hey again @wojtek-t 👋, 1.31 Enhancements team here,

Just checking in as we approach code freeze at 02:00 UTC Wednesday 24th July 2024 / 19:00 PDT Tuesday 23rd July 2024.

Here's where this enhancement currently stands:

  • [x] All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • [x] All PR/s are ready to be merged (they have approved and lgtm labels applied) by the code freeze deadline. This includes tests.

With all the implementation(code related) PRs merged as per the issue description:

  • https://github.com/kubernetes/kubernetes/pull/124642

Additionally, please let me know if there are any other PRs in k/k not listed in the description that we should track for this KEP, so that we can maintain accurate status.

This enhancement is now marked as tracked for code freeze for the 1.31 Code Freeze!

dipesh-rawat avatar Jul 01 '24 19:07 dipesh-rawat

@dipesh-rawat - I just linked another PR for it - with the second one (already merged too), we're ready for code-freeze.

wojtek-t avatar Jul 02 '24 10:07 wojtek-t

@wojtek-t Thanks for informing about the other PR https://github.com/kubernetes/kubernetes/pull/125483 related to this KEP. Could we also please add this PR in the issue description (here) for tracking purposes?

dipesh-rawat avatar Jul 02 '24 12:07 dipesh-rawat

Docs PR: https://github.com/kubernetes/website/pull/47063

wojtek-t avatar Jul 24 '24 13:07 wojtek-t

Hi, enhancements lead here - I inadvertently added this to the 1.32 tracking board 😀. Please readd it if you wish to progress this enhancement in 1.32.

/remove-label lead-opted-in

tjons avatar Sep 16 '24 12:09 tjons

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 15 '24 12:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 14 '25 13:01 k8s-triage-robot

/remove-lifecycle rotten

wojtek-t avatar Jan 14 '25 13:01 wojtek-t

@wojtek-t @serathius Is there anything preventing this from graduating to GA in 1.33? Anyone willing to own graduating this? If so let me know and I'll add it to the v1.33 milestone.

jpbetz avatar Jan 25 '25 00:01 jpbetz

Is there any plan to progress this to stable in 1.33? Should sig-api-machinery opt-in to this for milestone v1.33? (cc @serathius)

jpbetz avatar Jan 27 '25 18:01 jpbetz

@wojtek-t is out, I don't have enough context to know how much work is left. Will try to take a look.

serathius avatar Jan 28 '25 09:01 serathius

I think there are two parts of KEPs that need considered separately:

  • WatchCacheInitializationPostStartHook is Beta feature flag disabled by default, we might want to make it default, but I don't think we did any additional experiments that would inform making it default.
  • ResilientWatchCacheInitialization has been enabled on since 1.31, so it seems mature enough, however I don't know if we have answered whether we should adjust requests delegated to etcd. I think the conditional passthrough based on labels and limit adds needless complexity, but I don't know how concrete is the risk of delayed initialization. Would be nice to confirm or reject it.

serathius avatar Jan 29 '25 13:01 serathius

Sorry, I was OOO last weeks. I will not get to it for 1.33 upstream, but here is what I would like to happen:

  • WatchCacheInitializationPostStartHook - this is disabled because we originally agreed with David that he would like to see some production data before enabling it in upstream by default. We should try to enable it in GKE for 1.33 and based on that enable in upstream in 1.34
  • ResilientWatchCacheInitialization - I think this is ready for ga promotion. We can tune it further (e.g. what Marek wrote above), but this is imho incremental and I wouldn't block ga on it, and rather change is separately if we believe it would work better. So let's target ga of it for 1.34 too.

wojtek-t avatar Feb 10 '25 09:02 wojtek-t

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 11 '25 10:05 k8s-triage-robot

Proposing to GA the ResilientWatchCacheInitialization flag.

Started working on collecting production data for WatchCacheInitializationPostStartHook. It might take some time to get rolled out and get sufficient soak, so KEP might not get graduated this cycle.

serathius avatar May 27 '25 11:05 serathius

Proposal in https://github.com/kubernetes/enhancements/pull/5350

serathius avatar Jun 03 '25 17:06 serathius

/label lead-opted-in /milestone v1.34

jpbetz avatar Jun 03 '25 17:06 jpbetz

Hi @wojtek-t :wave:, v1.34 Enhancements team here.

This is a reminder of the upcoming PRR Freeze on Thursday 12th June 2025.

By this date, there must be a PR open in k/enhancements with:

  • The KEP's PRR questionnaire filled out.
  • The kep.yaml updated with the stage, latest-milestone, and milestone struct filled out.
  • A PRR approval file with the PRR approver listed for the stage the KEP is targeting.

Having the PRR questionnaire filled out by this deadline will help ensure that the PRR team has enough time to review your KEP before Enhancements Freeze on Friday 20th June 2025. For more information on the PRR process, see here.

stmcginnis avatar Jun 06 '25 11:06 stmcginnis

Hello @nabokihms 👋, 1.34 Enhancements team here again.

Just checking in as we approach Enhancements Freeze on 21:00 UTC Friday 20th June 2025.

This enhancement is targeting stage stable for 1.34 (correct me, if otherwise).

/stage stable

Here's where this enhancement currently stands:

  • [X] KEP readme using the latest template has been merged into the k/enhancements repo.
  • [X] KEP status is marked as implementable for latest-milestone: v1.34. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • [X] KEP readme has up-to-date graduation criteria.
  • [X] KEP has submitted a production readiness review and a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to update the following:

  • Please confirm the target stage. This issue description says beta, but looks like this is actually targeting stable (#5350)
  • Update this issue description to reflect current status

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

The status of this enhancement is marked as Tracked for enhancements freeze. If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!

stmcginnis avatar Jun 11 '25 18:06 stmcginnis

Hi @nabokihms @wojtek-t 👋 -- this is Graziano (@graz-dev) from the 1.34 Communications Team!

For the 1.34 release, we are currently in the process of collecting and curating a list of potential feature blogs, and we'd love for you to consider writing one for your enhancement!

As you may be aware, feature blogs are a great way to communicate to users about features which fall into (but not limited to) the following categories:

  • This introduces some breaking change(s)
  • This has significant impacts and/or implications to users
  • ...Or this is a long-awaited feature, which would go a long way to cover the journey more in detail 🎉

To opt in to write a feature blog, could you please let us know and open a "Feature Blog placeholder PR" (which can be only a skeleton at first) against the website repository by Friday 11th July? For more information about writing a blog, please find the blog contribution guidelines 📚

[!Tip] Some timeline to keep in mind:

  • 02:00 UTC Friday 11th July 2025: Feature blog PR freeze
  • Friday 8th August 2025: Feature blogs ready for review
  • You can find more in the release document

[!Note] In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we have a final number of feature blogs for this release.

graz-dev avatar Jun 21 '25 13:06 graz-dev

Hi @wojtek-t :wave:, v1.34 Docs Shadow here.

Does this enhancement work planned for 1.34 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.34 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before the Docs placeholder/draft PR deadline - Thursday 3rd July 2025 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarized with the docs requirement for the release.

Thank you for your work!

ArvindParekh avatar Jun 26 '25 14:06 ArvindParekh

Graduated ResilientWatchCacheInitialization in https://github.com/kubernetes/kubernetes/pull/131979

serathius avatar Jul 01 '25 10:07 serathius

Checking in @serathius :wave:, 1.34 Docs Lead here.

Just a reminder to open a placeholder PR against dev-1.34 branch in the k/website repo for this (steps available here) for this KEP if it requires new or modifications to existing docs:

The deadline for this is Thursday July 3 at 18:00 PDT. Thanks! :rocket:

michellengnx avatar Jul 03 '25 15:07 michellengnx