KEP-5237: watch-based route controller reconciliation using informers
- One-line PR description: Introduce a feature gate to enable informer-based reconciliation in the routes' controller of cloud-controller-manager, reducing API calls and improving efficiency.
- Issue link: https://github.com/kubernetes/enhancements/issues/5237
- Other comments:
Welcome @lukasmetzner!
It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/enhancements has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @lukasmetzner. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/cc @elmiko @joelspeed
/ok-to-test
@elmiko If possible, I’d love to work through the open questions here in the PR so we can keep things moving, and then have a discussion in the next SIG meeting. Otherwise, we might end up losing a two of weeks of time. What do you think?
Open Questions:
- What should be the default frequency for the periodic full reconciliation?
- Input from @JoelSpeed: We should do it similar to other controllers and choose a random time between 12h and 24h.
- As we use the same shared informers' factory as in other controllers, this should already be implemented.
- Are there other Node fields besides
node.status.addressesandPodCIDRsthat should trigger a route update? - How should we set the interval for the periodic reconcile? Options:
- Adjust
--route-reconcile-periodwhen feature gate enabled - Use
--min-resync-period; currently defaults to 12h - Introduce a new flag
- If we use the 12h-24h option we can probably reuse
--min-resync-period.
- Adjust
i'm fine to continue the discussions here.
Input from @JoelSpeed: We should do it similar to other controllers and choose a random time between 12h and 24h.
i think 12h sounds fine to me.
Are there other Node fields besides node.status.addresses and PodCIDRs that should trigger a route update?
i'll have to think about this a little more, i have a feeling that those are good to start with.
How should we set the interval for the periodic reconcile?
i like the idea of adjusting the default for the --route-reconcile-period, but i don't want users to get confused about this.
my only issue with using --min-resync-period is that it sounds much more general and we are just focusing on the route controller.
i like the idea of adjusting the default for the --route-reconcile-period, but i don't want users to get confused about this. my only issue with using --min-resync-period is that it sounds much more general and we are just focusing on the route controller
To my understanding, both the service and node controllers are already watch-based and should utilize the --min-resync-period flag. Adopting this approach would bring consistency across the CCM components. In this context, the --route-reconcile-period flag could be considered for deprecation.
ack, thank you @lukasmetzner that makes sense to me. it seems we should focus on --min-resync period then.
/auto-cc
Just a friendly reminder, as the 1.34 release window is closing :D
@lukasmetzner do we need another round of reviews here?
@elmiko From my point of view, I am happy with the content, and we could get this merged or move forward.
This is my first contribution, have I missed adding some labels? 😅
This is my first contribution, have I missed adding some labels? 😅
no, i don't think so. just wanted to make sure i didn't miss anything. i will give another review before end of week.
This is my first contribution, have I missed adding some labels? 😅
no, i don't think so. just wanted to make sure i didn't miss anything. i will give another review before end of week.
Sounds good. With the PRR freeze tomorrow, I guess we won't make it into the v1.34 milestone. I will bump the milestone targets then.
Sounds good. With the PRR freeze tomorrow, I guess we won't make it into the v1.34 milestone. I will bump the milestone targets then.
ahh, apologies. i missed that the freeze is today. sorry for the inconvenience and delay.
thanks for the updates @lukasmetzner , i probably won't get a chance to review today but i will review again early next week.
@elmiko I forgot to update the milestones to now target v1.35 for the alpha.
@JoelSpeed Just wondering if you’ve had a chance to review the PR again, or if you might be able to soon?
/lgtm
@elmiko @JoelSpeed As I have got lgtms from both of you with a small discussion remaining, shall I fill out the production readiness questionnaire? The next PRR freeze for v1.35 is on the 9th of October. If possible, I would like to target v1.35.
yeah, that sounds good to me @lukasmetzner
I saw that we already had the questionnaire filled out, so I moved it towards prod-readiness.
IMO we could also add a metric for A/B testing in beta (cc @JoelSpeed)? I would like to avoid missing the PRR freeze because of that :D
I already got lgtms on the KEP in general and have moved the state to implementable.
@elmiko Do you know if the lead-opted-in also needs to be applied to the issue, or only the PR? I used /label lead-opted-in, but I am unsure if this is something I should do, or a sig-cloud-provider lead.
Since I’m quite new here, I hope I’ve followed the process correctly. Apologies in advance if I’ve missed anything.
/assign @elmiko /assign @wojtek-t /label lead-opted-in
@lukasmetzner: Can not set label lead-opted-in: Must be member in one of these teams: [release-team-enhancements release-team-leads sig-api-machinery-leads sig-apps-leads sig-architecture-leads sig-auth-leads sig-autoscaling-leads sig-cli-leads sig-cloud-provider-leads sig-cluster-lifecycle-leads sig-contributor-experience-leads sig-docs-leads sig-instrumentation-leads sig-k8s-infra-leads sig-multicluster-leads sig-network-leads sig-node-leads sig-release-leads sig-scalability-leads sig-scheduling-leads sig-security-leads sig-storage-leads sig-testing-leads sig-windows-leads]
In response to this:
I saw that we already had the questionnaire filled out, so I moved it towards prod-readiness.
IMO we could also add a metric for A/B testing in beta (cc @JoelSpeed)? I would like to avoid missing the PRR freeze because of that :D
I already got
lgtmson the KEP in general and have moved the state to implementable.@elmiko Do you know if the
lead-opted-inalso needs to be applied to the issue, or only the PR?Since I’m quite new here, I hope I’ve followed the process correctly. Apologies in advance if I’ve missed anything.
/assign @elmiko /assign @wojtek-t /label lead-opted-in
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@elmiko Do you know if the lead-opted-in also needs to be applied to the issue, or only the PR? I used /label lead-opted-in, but I am unsure if this is something I should do, or a sig-cloud-provider lead.
good question, i will investigate early next week.
IMO we could also add a metric for A/B testing in beta (cc @JoelSpeed)? I would like to avoid missing the PRR freeze because of that :D
I'm happy with that as long as we can leverage the metric in the beta stage with the feature both enabled and disabled to make a comparison between the two implementations to determine the performance impact
PRR lgtm. Thank you for addressing metrics early, that always makes things easier for making alpha to beta decisions.
/approve
@elmiko @JoelSpeed
Do you have a sig-cloud-provider owner to look at this?
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: deads2k, elmiko, lukasmetzner
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~keps/prod-readiness/OWNERS~~ [deads2k]
- ~~keps/sig-cloud-provider/OWNERS~~ [elmiko]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/lgtm
/assign @wojtek-t
Just for posterity - this proposal LGTM too.