controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Rely on API Priority and Fairness (APF) instead of client-side rate limiting

Open Sijoma opened this issue 1 year ago • 2 comments

When relying on controller-runtime defaults, it's difficult to spot whether a controller is rate-limited on the client-side. This can have a negative impact on controller performance as the reconcile loop will be slowed down.

The recommendation from slack discussions seems to be to always:

  1. set QPS to -1 everywhere and rely on APF
  2. use the cache wherever possible

One of the concerns was that this might not be safe default when users are relying on older Kubernetes versions. I'm not sure how those older Kubernetes versions would work with an updated controller-runtime in any case? Isn't the client too new then?

Discussion links:

lavalamp: even if it works the right way you should still turn it off and rely on APF

Link to sig-apimachinery recommendation: https://kubernetes.slack.com/archives/C0EG7JC6T/p1680796612287719?thread_ts=1680791299.631439&cid=C0EG7JC6T

Controller Runtime discussion - https://kubernetes.slack.com/archives/C02MRBMN00Z/p1724224928349419

Sijoma avatar Aug 26 '24 08:08 Sijoma

I'm not sure how those older Kubernetes versions would work with an updated controller-runtime in any case? Isn't the client too new then?

Controller-runtime in general is compatible with a wide-range of Kubernetes versions on the server-side. Otherwise folks would have to make sure the controller they use use exactly the same Kubernetes version as the server. An example is Cluster API which usually is compatible with ~ 6-7 Kubernetes versions (https://cluster-api.sigs.k8s.io/reference/versions#core-provider-cluster-api-controller).

Of course folks have to be careful about which features of the kube-apiserver and builtin APIs they rely on, but controller-runtime itself tries to not depend on any specific new kube-apiserver feature.

sbueringer avatar Aug 26 '24 11:08 sbueringer

Seems like Crossplane is also running into issues around these settings:

https://github.com/crossplane/crossplane/pull/5742/files

Sijoma avatar Sep 06 '24 13:09 Sijoma

Thanks for bringing up this discussion! We were having problems with backed up workqueues in some of our controllers and disabling the client-side rate limiter in favor of APF took our latency from O(hours) to O(seconds).

If it makes sense, we would love to see related options in controller-runtime client/manager rather than having to modify the *rest.Config before creating the client, but the current solution is workable for us :)

nathanperkins avatar Oct 29 '24 21:10 nathanperkins

This is generally a good idea, if it has to be in controller runtime, which we could support, needs to be able to determine if APF is enabled on the server, or fallback gracefully.

vincepri avatar Oct 30 '24 02:10 vincepri

Not sure if we really should add additional options to Client / Manager that would overwrite the corresponding options of the rest.Config

sbueringer avatar Oct 30 '24 05:10 sbueringer

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 28 '25 06:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 27 '25 06:02 k8s-triage-robot

@alvaroaleman Is this done with: https://github.com/kubernetes-sigs/controller-runtime/pull/3119?

Or do we still have some rate-limiter that are enabled on some of our clients somehow? (I assume not)

sbueringer avatar Feb 27 '25 13:02 sbueringer

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 29 '25 13:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 29 '25 13:03 k8s-ci-robot