aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

AWS LBC performance improvement

Open oliviassss opened this issue 10 months ago • 2 comments

Describe the feature you are requesting

Performance improve Motivation

Running LBC in large scale cluster and stress testing its performance. Describe the proposed solution you'd like

  1. When I tested LBC to provision 50 LBs with 1k targets in each, I observed Mem spike to upto 7GiB. We need to do further profiling on Memory and optimization if possible. Suspect it's the pod cache
  2. The LBC is listing all pods during startup, and the init list call cannot be paginated due to an known limit from k8s api server side. There's a workaround to limit the watch namespace via --watch-namespace flag, and the init list call will only list pods within the specified ns. But this flag only support 1 ns, as upstream controller-runtime supports multiple watch namespace, we need to improve the flag to support multiple namespaces as well.
  3. The service reconciler is watching and caching all types of service, like ClusterIP, NodePort and LoadBalancer, which is unnecessary for large scale cluster if they have thousands of service objects other than LoadBalancer type. We need to investigate to improve the service reconciler to only watch for LoadBalanacer type. Describe alternatives you've considered

Contribution Intention (Optional)

-[ ] Yes, I am willing to contribute a PR to implement this feature -[ ] No, I cannot work on a PR at this time

Dashboard monitored for reference. Image

oliviassss avatar Jun 18 '25 18:06 oliviassss

Big +1 to getting this prioritized. The workaround to make it work with large number of services is to specify --watch-namespace which currently can only watch a single namespace which is only a partial workaround.

jaymindesai avatar Sep 16 '25 03:09 jaymindesai

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 15 '25 03:12 k8s-triage-robot