TargetGroupBinding on multi cluster
Describe the bug TargetGroupBinding allow to specify TG arn. I am using targetGroupBinding twice on both cluster to see if targets (pods) show up from both clusters in the targetGroup.
Unfortunately, it is not. it is a race condition happening between both cluster. The result is only 1 cluster be able to binding to targetgroup at a particular time.
Expected outcome I am thinking that the problem above should be show all the targets on both clusters
Environment
- AWS Load Balancer controller version: 2.1
- Kubernetes version 1.18
- Using EKS (yes/no), if so version? yes, 1.18
Additional Context:
@ttony, the controller currently assumes exclusive ownership for the target groups. Target groups should not be shared across multiple controllers to avoid any race condition.
/kind feature Here is the outline
- maintain a configmap for each tgb
- add all targets to be registered to the configmap
- when deleting targets, limit to the entries from the configmap This will enable multiple controllers to share the target groups.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
We also find this feature very useful in some scenarios:
- migrate from one cluster to another
- deploy multi-cluster in different zones, and split traffic at the shared gateway, to reduce cross-zone traffic costs.
We also found similar products in Google Cloud: https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-multi-cluster-gateways
waiting for further progress of this issue.
/remove-lifecycle rotten
Came looking for this. Migration from one cluster to many is the use case.
/kind feature Here is the outline
* maintain a configmap for each tgb * add all targets to be registered to the configmap * when deleting targets, limit to the entries from the configmap This will enable multiple controllers to share the target groups.
Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status?
@kishorj @oliviassss Hey we are really looking for a feature like this. Can you please provide an update on the same or any progress whatsoever would be really helpful? Thanks
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
I think this would be a very useful feature. Even a coarse-grained solution (e.g attaching specific AZ or subnet/CIDR to a cluster) will cover a lot of user scenarios and easy to implement (high efficient) IMHO.
According to the documentation, the TargetGroupBinding CDR can be used for the UseCase: Externally Managed Load Balancer.
In our current scenario, LoadBalancer, Listener and TargetGroup were created manually. The TargetGroup is of Type instance and contains 1 healthy EC2 instance.
When creating the TargetGroupBinding inside our EKS cluster, we experience the exact same behaviour, the AWS LoadBalancer Controller registers the worker nodes from the cluster (as expected), but also drains/deregisters the aforementioned EC2 instance.
As far as I've understood the documentation, this should be the exact use-case mentioned but the discussion within this issue gives me a different feeling. 🤔 I would be really thankful for help or feedback.
+1 to this feature request.
This would significantly simplify managing services across multiple k8s clusters. They can register targets in the same target group and allow for a simple way to smoothly route traffic between several clusters.
The challenge for the controller is to know when to de-register the target when it's no longer present in the k8s cluster.
Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status? This seems like a straight-forward path.
Another option is to allow storing that state is something like dynamodb (given that this is AWS ALB controller) and require that a dynamo table is provided if you want multi-cluster support.
There's an exciting note on https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2757
There is a feature request https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2173 to support sharing tgb across multiple cluster/controllers. It is in our roadmap for the next minor release.
@kishorj is this still planned? Do you know roughly when?
You can follow this in the meantime: https://aws.amazon.com/blogs/aws/new-application-load-balancer-simplifies-deployment-with-weighted-target-groups/
@SunitAccelup unless I’m mistaken, weighted target groups aren’t available on network load balancers.
Just wanted to add my voice to this need. Currently, we can only do canary deploys with an external ALB because that is the only thing that supports weighting of groups.
I was hoping that we could set TargetGroupBinding settings in a way that said something like "shared" on the spec and then the AWS Load Balancer Controller would only track the instance ids or ips that it originally registered
We definitely need this, would love to see this feature get implemented.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
+1: Highly useful
/remove-lifecycle rotten