aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

TargetGroupBinding on multi cluster

Open ttony opened this issue 4 years ago • 26 comments

Describe the bug TargetGroupBinding allow to specify TG arn. I am using targetGroupBinding twice on both cluster to see if targets (pods) show up from both clusters in the targetGroup.

Unfortunately, it is not. it is a race condition happening between both cluster. The result is only 1 cluster be able to binding to targetgroup at a particular time.

Expected outcome I am thinking that the problem above should be show all the targets on both clusters

Environment

  • AWS Load Balancer controller version: 2.1
  • Kubernetes version 1.18
  • Using EKS (yes/no), if so version? yes, 1.18

Additional Context:

ttony avatar Aug 12 '21 02:08 ttony

@ttony, the controller currently assumes exclusive ownership for the target groups. Target groups should not be shared across multiple controllers to avoid any race condition.

kishorj avatar Aug 13 '21 17:08 kishorj

/kind feature Here is the outline

  • maintain a configmap for each tgb
  • add all targets to be registered to the configmap
  • when deleting targets, limit to the entries from the configmap This will enable multiple controllers to share the target groups.

kishorj avatar Aug 25 '21 22:08 kishorj

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 23 '21 22:11 k8s-triage-robot

/remove-lifecycle stale

maruina avatar Nov 24 '21 15:11 maruina

/assign

oliviassss avatar Jan 06 '22 19:01 oliviassss

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 06 '22 20:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 06 '22 20:05 k8s-triage-robot

We also find this feature very useful in some scenarios:

  • migrate from one cluster to another
  • deploy multi-cluster in different zones, and split traffic at the shared gateway, to reduce cross-zone traffic costs.

We also found similar products in Google Cloud: https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-multi-cluster-gateways

waiting for further progress of this issue.

ryan4yin avatar May 31 '22 12:05 ryan4yin

/remove-lifecycle rotten

Came looking for this. Migration from one cluster to many is the use case.

iamnoah avatar Jun 24 '22 15:06 iamnoah

/kind feature Here is the outline

* maintain a configmap for each tgb

* add all targets to be registered to the configmap

* when deleting targets, limit to the entries from the configmap
  This will enable multiple controllers to share the target groups.

Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status?

yann-soubeyrand avatar Jul 21 '22 20:07 yann-soubeyrand

@kishorj @oliviassss Hey we are really looking for a feature like this. Can you please provide an update on the same or any progress whatsoever would be really helpful? Thanks

shubhamsre avatar Aug 27 '22 05:08 shubhamsre

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 25 '22 06:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 25 '22 07:12 k8s-triage-robot

/remove-lifecycle rotten

aaroniscode avatar Jan 04 '23 16:01 aaroniscode

I think this would be a very useful feature. Even a coarse-grained solution (e.g attaching specific AZ or subnet/CIDR to a cluster) will cover a lot of user scenarios and easy to implement (high efficient) IMHO.

chuyee avatar Jan 19 '23 22:01 chuyee

According to the documentation, the TargetGroupBinding CDR can be used for the UseCase: Externally Managed Load Balancer.

In our current scenario, LoadBalancer, Listener and TargetGroup were created manually. The TargetGroup is of Type instance and contains 1 healthy EC2 instance.

When creating the TargetGroupBinding inside our EKS cluster, we experience the exact same behaviour, the AWS LoadBalancer Controller registers the worker nodes from the cluster (as expected), but also drains/deregisters the aforementioned EC2 instance.

As far as I've understood the documentation, this should be the exact use-case mentioned but the discussion within this issue gives me a different feeling. 🤔 I would be really thankful for help or feedback.

tgraupne avatar Mar 07 '23 18:03 tgraupne

+1 to this feature request.

This would significantly simplify managing services across multiple k8s clusters. They can register targets in the same target group and allow for a simple way to smoothly route traffic between several clusters.

The challenge for the controller is to know when to de-register the target when it's no longer present in the k8s cluster.

Maybe I’m saying something stupid, but couldn’t this information be recorded in the TargetGroupBinding status? This seems like a straight-forward path.

Another option is to allow storing that state is something like dynamodb (given that this is AWS ALB controller) and require that a dynamo table is provided if you want multi-cluster support.

There's an exciting note on https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2757

There is a feature request https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2173 to support sharing tgb across multiple cluster/controllers. It is in our roadmap for the next minor release.

@kishorj is this still planned? Do you know roughly when?

alexsorokin-at avatar Mar 10 '23 22:03 alexsorokin-at

You can follow this in the meantime: https://aws.amazon.com/blogs/aws/new-application-load-balancer-simplifies-deployment-with-weighted-target-groups/

SunitAccelup avatar Apr 05 '23 01:04 SunitAccelup

@SunitAccelup unless I’m mistaken, weighted target groups aren’t available on network load balancers.

yann-soubeyrand avatar Apr 05 '23 07:04 yann-soubeyrand

Just wanted to add my voice to this need. Currently, we can only do canary deploys with an external ALB because that is the only thing that supports weighting of groups.

I was hoping that we could set TargetGroupBinding settings in a way that said something like "shared" on the spec and then the AWS Load Balancer Controller would only track the instance ids or ips that it originally registered

hanseltime avatar Jun 12 '23 04:06 hanseltime

We definitely need this, would love to see this feature get implemented.

micahnoland avatar Aug 15 '23 15:08 micahnoland

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 26 '24 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 25 '24 12:02 k8s-triage-robot

+1: Highly useful

jkdihenkar avatar Mar 14 '24 04:03 jkdihenkar

/remove-lifecycle rotten

sidewinder12s avatar Mar 14 '24 22:03 sidewinder12s