aws-load-balancer-controller TargetGroupBinding on multi cluster

Describe the bug TargetGroupBinding allow to specify TG arn. I am using targetGroupBinding twice on both cluster to see if targets (pods) show up from both clusters in the targetGroup.

Unfortunately, it is not. it is a race condition happening between both cluster. The result is only 1 cluster be able to binding to targetgroup at a particular time.

Expected outcome I am thinking that the problem above should be show all the targets on both clusters

Environment

AWS Load Balancer controller version: 2.1
Kubernetes version 1.18
Using EKS (yes/no), if so version? yes, 1.18

Additional Context:

Aug 12 '21 02:08 ttony

@ttony, the controller currently assumes exclusive ownership for the target groups. Target groups should not be shared across multiple controllers to avoid any race condition.

Aug 13 '21 17:08 kishorj

/kind feature Here is the outline

maintain a configmap for each tgb
add all targets to be registered to the configmap
when deleting targets, limit to the entries from the configmap This will enable multiple controllers to share the target groups.

Aug 25 '21 22:08 kishorj

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 23 '21 22:11 k8s-triage-robot

/remove-lifecycle stale

Nov 24 '21 15:11 maruina

/assign

Jan 06 '22 19:01 oliviassss

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 06 '22 20:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 06 '22 20:05 k8s-triage-robot

We also find this feature very useful in some scenarios:

migrate from one cluster to another
deploy multi-cluster in different zones, and split traffic at the shared gateway, to reduce cross-zone traffic costs.

We also found similar products in Google Cloud: https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-multi-cluster-gateways

waiting for further progress of this issue.

May 31 '22 12:05 ryan4yin

/remove-lifecycle rotten

Came looking for this. Migration from one cluster to many is the use case.

Jun 24 '22 15:06 iamnoah

/kind feature Here is the outline

* maintain a configmap for each tgb

* add all targets to be registered to the configmap

* when deleting targets, limit to the entries from the configmap
  This will enable multiple controllers to share the target groups.

Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status?

Jul 21 '22 20:07 yann-soubeyrand

@kishorj @oliviassss Hey we are really looking for a feature like this. Can you please provide an update on the same or any progress whatsoever would be really helpful? Thanks

Aug 27 '22 05:08 shubhamsre

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 25 '22 06:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 25 '22 07:12 k8s-triage-robot

/remove-lifecycle rotten

Jan 04 '23 16:01 aaroniscode

I think this would be a very useful feature. Even a coarse-grained solution (e.g attaching specific AZ or subnet/CIDR to a cluster) will cover a lot of user scenarios and easy to implement (high efficient) IMHO.

Jan 19 '23 22:01 chuyee

According to the documentation, the TargetGroupBinding CDR can be used for the UseCase: Externally Managed Load Balancer.

In our current scenario, LoadBalancer, Listener and TargetGroup were created manually. The TargetGroup is of Type instance and contains 1 healthy EC2 instance.

When creating the TargetGroupBinding inside our EKS cluster, we experience the exact same behaviour, the AWS LoadBalancer Controller registers the worker nodes from the cluster (as expected), but also drains/deregisters the aforementioned EC2 instance.

As far as I've understood the documentation, this should be the exact use-case mentioned but the discussion within this issue gives me a different feeling. 🤔 I would be really thankful for help or feedback.

Mar 07 '23 18:03 tgraupne

+1 to this feature request.

This would significantly simplify managing services across multiple k8s clusters. They can register targets in the same target group and allow for a simple way to smoothly route traffic between several clusters.

The challenge for the controller is to know when to de-register the target when it's no longer present in the k8s cluster.

Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status? This seems like a straight-forward path.

Another option is to allow storing that state is something like dynamodb (given that this is AWS ALB controller) and require that a dynamo table is provided if you want multi-cluster support.

There's an exciting note on https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2757

There is a feature request https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2173 to support sharing tgb across multiple cluster/controllers. It is in our roadmap for the next minor release.

@kishorj is this still planned? Do you know roughly when?

Mar 10 '23 22:03 alexsorokin-at

You can follow this in the meantime: https://aws.amazon.com/blogs/aws/new-application-load-balancer-simplifies-deployment-with-weighted-target-groups/

Apr 05 '23 01:04 SunitAccelup

@SunitAccelup unless I’m mistaken, weighted target groups aren’t available on network load balancers.

Apr 05 '23 07:04 yann-soubeyrand

Just wanted to add my voice to this need. Currently, we can only do canary deploys with an external ALB because that is the only thing that supports weighting of groups.

I was hoping that we could set TargetGroupBinding settings in a way that said something like "shared" on the spec and then the AWS Load Balancer Controller would only track the instance ids or ips that it originally registered

Jun 12 '23 04:06 hanseltime

We definitely need this, would love to see this feature get implemented.

Aug 15 '23 15:08 micahnoland

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 26 '24 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 25 '24 12:02 k8s-triage-robot

+1: Highly useful

Mar 14 '24 04:03 jkdihenkar

/remove-lifecycle rotten

Mar 14 '24 22:03 sidewinder12s