aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

Loadbalancer Controllers of different clusters running on Same VPC try to manage same SG and cause deadlock

Open kakarotbyte opened this issue 1 year ago • 5 comments

Describe the bug LoadBalancer Controllers of different clusters running on Same VPC try to manage same Sg and cause deadlock

Steps to reproduce

Launch 2 clusters in Same VPC.

Cluster A:

  • Install Karpenter
  • Install AWS LoadBalancer Controller with IRSA.
  • Create a test deployment, service, ingress.

Cluster B:

  • Has cluster SG
  • Nodes use cluster SG.
  • Install AWS LoadBalancer Controller with IRSA.
  • AWS LoadBalancer controller manages the SG rules of the Cluster B SG.
  • Create a test deployment, service, ingress.

Reproduction:

  • In Cluster A create a Karpenter node template (AWSNodeTemplate) which uses cluster SG of Cluster B
  • Launch nodes using that Karpenter node template in Cluster A.
  • Delete/remove the nodes launched by Karpenter.

Reproduction Outcome

  • At this point AWS LoadBalancer Controller of Cluster A will cache cluster SG of Cluster B as one of the backend node Security group. As this is considered as a backend end node SG by AWS LoadBalancer Controller it will try to manage and reconcile it continously.
  • This is a problem because each SG rule that is managed by AWS LoadBalancer controller has description elbv2.k8s.aws/targetGroupBinding=shared. Since both Cluster's AWS LoadBalancer Controller's will assume the following:
    • Cluster A's AWS LoadBalancer Controller: It will consider Rule Added by cluster B's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
    • Cluster B's AWS LoadBalancer Controller: It will consider Rule Added by cluster A's AWS LoadBalancer Controller's to be not valid and will try to delete them using RevokeSecurityGroupIngress API. At the same time if there are any rules it needs to add it will add them using AuthorizeSecurityGroupIngress API
  • Essentially the change in SG will cause ALB to fail health checks intermittently.

Current workarounds Workaround 1: - Allow all traffic from VPC CIDR in SG. this will bypass the deadlock cause ALB health checks to not fail. Workaround 2 - Two clusters in same vpc should have different set of SGs above with current versions. Make sure that cluster A is not using any node SG's from cluster B as its own node SG. Once this is confirmed restart the LoadBalancer controller to clear cache.

$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 0
$ kubectl scale deployment -n kube-system aws-load-balancer-controller --replicas 1

Expected outcome

  • Expected outcome is that LoadBalancer Controller's should not manage the rules it didnt create.

Feature enhancement request:

kakarotbyte avatar Aug 09 '23 16:08 kakarotbyte

Yes, this is a known limitation. Currently the controller expects a different set of node security group for different clusters in same VPC.

The controller manages the rules of these node security groups of

  1. all SGs in vpc with a "cluster tag" with matching cluster name
  2. SGs for ENIs that backs IP of pods that were used as target group backend.

We currently use this elbv2.k8s.aws/targetGroupBinding=shared to denote the SG rules managed by this controller, and it don't have a way to distinguish between clusters. We can do a change to start tag SG rules with more detailed information(e.g. cluster name/TGB name) to support this use case.

M00nF1sh avatar Aug 09 '23 17:08 M00nF1sh

I commented on an older ticket for this, but I have a patch here https://github.com/dlmather/aws-load-balancer-controller/tree/dmather/patch-aws-lb-controller-sg-conflicts to support tagging by cluster name to handle this. I think tackling the full problem is a bit more complicated than what my solution handles (there are issues around situations where the same SG rule is being added by multiple clusters), but if there is interest I am willing to try to clean up what I have and raise a PR here.

dlmather avatar Aug 11 '23 18:08 dlmather

added this to our backlog. i think we can solve this via tagging on SG rules. PR are definitely welcome :D

M00nF1sh avatar Oct 03 '23 19:10 M00nF1sh

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 29 '24 15:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 28 '24 16:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 29 '24 16:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 29 '24 16:03 k8s-ci-robot