aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Loadbalancer Controllers of different clusters running on Same VPC try to manage same SG and cause deadlock
Describe the bug LoadBalancer Controllers of different clusters running on Same VPC try to manage same Sg and cause deadlock
Steps to reproduce
Launch 2 clusters in Same VPC.
Cluster A:
- Install Karpenter
- Install AWS LoadBalancer Controller with IRSA.
- Create a test deployment, service, ingress.
Cluster B:
- Has cluster SG
- Nodes use cluster SG.
- Install AWS LoadBalancer Controller with IRSA.
- AWS LoadBalancer controller manages the SG rules of the Cluster B SG.
- Create a test deployment, service, ingress.
Reproduction:
- In Cluster A create a Karpenter node template (AWSNodeTemplate) which uses cluster SG of Cluster B
- Launch nodes using that Karpenter node template in Cluster A.
- Delete/remove the nodes launched by Karpenter.
Reproduction Outcome
- At this point AWS LoadBalancer Controller of Cluster A will cache cluster SG of Cluster B as one of the backend node Security group. As this is considered as a backend end node SG by AWS LoadBalancer Controller it will try to manage and reconcile it continously.
- This is a problem because each SG rule that is managed by AWS LoadBalancer controller has description
elbv2.k8s.aws/targetGroupBinding=shared
. Since both Cluster's AWS LoadBalancer Controller's will assume the following:- Cluster A's AWS LoadBalancer Controller: It will consider Rule Added by cluster B's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
- Cluster B's AWS LoadBalancer Controller: It will consider Rule Added by cluster A's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
- Essentially the change in SG will cause ALB to fail health checks intermittently.
Current workarounds Workaround 1: - Allow all traffic from VPC CIDR in SG. this will bypass the deadlock cause ALB health checks to not fail. Workaround 2 - Two clusters in same vpc should have different set of SGs above with current versions. Make sure that cluster A is not using any node SG's from cluster B as its own node SG. Once this is confirmed restart the LoadBalancer controller to clear cache.
$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 0
$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 1
Expected outcome
- Expected outcome is that LoadBalancer Controller's should not manage the rules it didnt create.
Feature enhancement request:
- To resolve this please utilize Tag security group rules and add an unique identifier.
Yes, this is a known limitation. Currently the controller expects a different set of node security group for different clusters in same VPC.
The controller manages the rules of these node security groups of
- all SGs in vpc with a "cluster tag" with matching cluster name
- SGs for ENIs that backs IP of pods that were used as target group backend.
We currently use this elbv2.k8s.aws/targetGroupBinding=shared
to denote the SG rules managed by this controller, and it don't have a way to distinguish between clusters. We can do a change to start tag SG rules with more detailed information(e.g. cluster name/TGB name) to support this use case.
I commented on an older ticket for this, but I have a patch here https://github.com/dlmather/aws-load-balancer-controller/tree/dmather/patch-aws-lb-controller-sg-conflicts to support tagging by cluster name to handle this. I think tackling the full problem is a bit more complicated than what my solution handles (there are issues around situations where the same SG rule is being added by multiple clusters), but if there is interest I am willing to try to clean up what I have and raise a PR here.
added this to our backlog. i think we can solve this via tagging on SG rules. PR are definitely welcome :D
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.