aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Support for ALBs in multiple AWS accounts
We have multiple EKS clusters, in multiple VPCs. That being said, we plan to have a centralized VPC just for ingress, following this pattern: https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/using-network-firewall-for-centralized-ingress.html
Since less is more, we want to keep the number of hops as low as possible, so we plan to have the following traffic setup:
Browser -> PUBLIC_ALB_IN_THE_INGRESS_ACCOUNT -> IP_OF_THE_EKS_POD_IN_THE_APPLICATION_ACCOUNT
That means we have at least two AWS accounts: INGRESS_ACCOUNT and APPLICATION_ACCOUNT
Our current setup is created with terraform, including the EKS cluster, the ALB, the DNS entry, the ALB rule and the ALB target group.
Which means the only thing that the aws-load-balancer-controller has to do is to sync the IPs of the k8s service with the target group.
For that, we use TargetGroupBindings ( https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/targetgroupbinding/targetgroupbinding/ )
Just for reference, this is what a TargetGroupBinding look like:
apiVersion: elbv2.k8s.aws/v1beta1
kind: TargetGroupBinding
metadata:
name: my-tgb
spec:
serviceRef:
name: awesome-service # route traffic to the awesome-service
port: 80
targetGroupARN: <arn-to-targetGroup>
So what we need is for the aws-load-balancer-controller to be able to interact with the target groups in different AWS accounts.
More specifically, we need this IAM permission on different AWS accounts:
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:ModifyTargetGroupAttributes",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:DeregisterTargets"
Ideally, we should be able to do that with AWS RAM, but AWS RAM can only share whole VPCs, which not something my security team is happy about.
So what I suggest we do in order to solve that problem is to patch the aws-load-balancer-controller to allow it to impersonate arbitrary IAM roles per TargetGroupBinding. This way we could solve not only my specific problem but more complex problems from others as well.
We could start by adding an alb.ingress.kubernetes.io/IamRoleArnToAssume annotation in the TargetGroupBinding. If that annotation is present, then the aws-load-balancer-controller would attempt to impersonate it before interacting (using the above IAM permissions) with that specific target group.
Such setup would be flexible enough to allow one aws-load-balancer-controller to manage all the target groups in the world, as long as there exists the right IAM roles for it to impersonate.
I am speaking with my employer to see if they would consider me (or somebody from my team) writing a pull request for that.
That being said, we are not in the aws-load-balancer-controller business hence it would be great to know if such a pull request would eventually be merged, so we don't have to maintain a fork in the long term.
Thank you
@marcosdiez Thank you for reaching out to us and sending this detailed information of your architecture and proposed solution. We will need to discuss this design proposal with our internal security team to figure out the security concerns. We will start that process.
@marcosdiez We had a review for this internally with our security engineer. He is fine with the feature and proposed solution. He had one concern around the cross account IAM permissions which may result in confused deputy problem. You can prevent it by adding the "ExternalId" to the trust policy of the IAM role so that the permissions are scoped down to specific accounts only. Please consider this while implementing.
Please note that we will have a formal appsec review as well once the implementation is complete. As long as it passes the security review, we will merge your PR. Looking forward to it.
Thank you @shraddhabang . I will discuss with my team and hopefully send you some code in a week or two.
Hi @marcosdiez Thank you for great suggestion.
How is your situation for this issue? We are also in the same problem for multi account, so we are looking forward to your PR.
Thanks.
Hi @imaharu . I actually only got to start it part time yesterday. Ask me again in a few weeks :)
@imaharu I published container with this feature working. You may check at https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/3691
@marcosdiez Thank you for creating great Pull Request. That looks good. Your code is easily to read. Our team learned a lot of things.
This motivates me, so I want to try oss contribution next time if i have a chance!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@shraddhabang Hello, is there any progress on this issue? We are eagerly awaiting the implementation of this feature, as our product also has the issues mentioned in this issue. If you don't mind, it would be very helpful if you could review and merge the relevant PRs. Thank you!
we are also super keen to use this feature. We heavily utilise AWS Accounts. We create an account per customer / purpose + environment. We have 3 environments (prod, stg and dev).
We then create one cluster account per environment purely for creatigng EKS clusters. We then create an EKS cluster per region we have presence.
So taking a boiled down version we would have:
- cluster-account in region eu-central-1 with one eks cluster
- customerA in region eu-central-1
- customerB in region eu-central-1
- etc
When creating the VPC in the cluster account, using AWS RAM we extend the private and public subnets from the VPC to all other workload accounts in that region+environment. So CustomerA and Customer B in region eu-central-1 for example will have public and private subnets present from the VPC created in cluster account for eu-central-1
With each environment, we have one EKS cluster per region. So taking for example eu-central-1 and dev, we would have one EKS cluster covering this combination.
Using Terraform, we then create the certs, DNS record, ALB, target groups, listeners in each customer account on the cluster-account VPC. All we then want to do within the EKS itself, using LB controller is use targetgroupbinding to bind the service(s) to the respective targetgroups we have created in each account.