aws-load-balancer-controller Loadbalancer Controllers of different clusters running on Same VPC try to manage same SG and cause deadlock

Describe the bug LoadBalancer Controllers of different clusters running on Same VPC try to manage same Sg and cause deadlock

Steps to reproduce

Launch 2 clusters in Same VPC.

Cluster A:

Install Karpenter
Install AWS LoadBalancer Controller with IRSA.
Create a test deployment, service, ingress.

Cluster B:

Has cluster SG
Nodes use cluster SG.
Install AWS LoadBalancer Controller with IRSA.
AWS LoadBalancer controller manages the SG rules of the Cluster B SG.
Create a test deployment, service, ingress.

Reproduction:

In Cluster A create a Karpenter node template (AWSNodeTemplate) which uses cluster SG of Cluster B
Launch nodes using that Karpenter node template in Cluster A.
Delete/remove the nodes launched by Karpenter.

Reproduction Outcome

At this point AWS LoadBalancer Controller of Cluster A will cache cluster SG of Cluster B as one of the backend node Security group. As this is considered as a backend end node SG by AWS LoadBalancer Controller it will try to manage and reconcile it continously.
This is a problem because each SG rule that is managed by AWS LoadBalancer controller has description elbv2.k8s.aws/targetGroupBinding=shared. Since both Cluster's AWS LoadBalancer Controller's will assume the following:
- Cluster A's AWS LoadBalancer Controller: It will consider Rule Added by cluster B's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
- Cluster B's AWS LoadBalancer Controller: It will consider Rule Added by cluster A's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
Essentially the change in SG will cause ALB to fail health checks intermittently.

Current workarounds Workaround 1: - Allow all traffic from VPC CIDR in SG. this will bypass the deadlock cause ALB health checks to not fail. Workaround 2 - Two clusters in same vpc should have different set of SGs above with current versions. Make sure that cluster A is not using any node SG's from cluster B as its own node SG. Once this is confirmed restart the LoadBalancer controller to clear cache.

$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 0
$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 1

Expected outcome

Expected outcome is that LoadBalancer Controller's should not manage the rules it didnt create.

Feature enhancement request:

To resolve this please utilize Tag security group rules and add an unique identifier.

Aug 09 '23 16:08 kakarotbyte

Yes, this is a known limitation. Currently the controller expects a different set of node security group for different clusters in same VPC.

The controller manages the rules of these node security groups of

all SGs in vpc with a "cluster tag" with matching cluster name
SGs for ENIs that backs IP of pods that were used as target group backend.

We currently use this elbv2.k8s.aws/targetGroupBinding=shared to denote the SG rules managed by this controller, and it don't have a way to distinguish between clusters. We can do a change to start tag SG rules with more detailed information(e.g. cluster name/TGB name) to support this use case.

Aug 09 '23 17:08 M00nF1sh

I commented on an older ticket for this, but I have a patch here https://github.com/dlmather/aws-load-balancer-controller/tree/dmather/patch-aws-lb-controller-sg-conflicts to support tagging by cluster name to handle this. I think tackling the full problem is a bit more complicated than what my solution handles (there are issues around situations where the same SG rule is being added by multiple clusters), but if there is interest I am willing to try to clean up what I have and raise a PR here.

Aug 11 '23 18:08 dlmather

added this to our backlog. i think we can solve this via tagging on SG rules. PR are definitely welcome :D

Oct 03 '23 19:10 M00nF1sh

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 29 '24 15:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 28 '24 16:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 29 '24 16:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 29 '24 16:03 k8s-ci-robot

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Loadbalancer Controllers of different clusters running on Same VPC try to manage same SG and cause deadlock

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard