aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

Desired available zone ids

Open shaoxt opened this issue 3 years ago • 11 comments

Issue

PR for #2496

Description

The proposal is to have an argument "--desired-availability-zone-ids" on aws-load-balancer-controller, so that the controller can pickup the targets within the desired AZs. This can switch the entire cluster as quick as possible.

Sample:

--desired-availability-zone-ids=usw2-az2,usw2-az3 This argument will limit the subnets specified on the annotation of Ingress. The argument won't limit any subnets if it doesn't specified anything. It won't impact anything by default.

alb.ingress.kubernetes.io/subnets: IngressSubnetAz1, IngressSubnetAz2, IngressSubnetAz3 Even the ingress specified 3 subnets, but the targets NOT in the desired availability zones won't be used in the targetGroup. The feature provides a lever to control the traffic out of one AZ quickly.

In SubnetResolver, it consumes the desired availability zone ids, so that it limit the subnets be selected in this given list.

Checklist

  • [x] Added tests that cover your change (if possible)
  • [x] Added/modified documentation as required (such as the README.md, or the docs directory)
  • [x] Manually tested
  • [x] Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

  • [ ] Backfilled missing tests for code in same general area :tada:
  • [ ] Refactored something and made the world a better place :star2:

shaoxt avatar Feb 08 '22 19:02 shaoxt

CLA Signed

The committers are authorized under a signed CLA.

  • :white_check_mark: Sheldon Shao (b27a593f20f38133f23fd6bf58dd1ba5825c29b7, 5e1e08bcfbc6e7c19660b02b4139df33e012ac11)

Welcome @shaoxt!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Feb 08 '22 19:02 k8s-ci-robot

Hi @shaoxt. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 08 '22 19:02 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shaoxt To complete the pull request process, please assign m00nf1sh after the PR has been reviewed. You can assign the PR to them by writing /assign @m00nf1sh in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Feb 08 '22 19:02 k8s-ci-robot

Going through the checklist

shaoxt avatar Feb 08 '22 19:02 shaoxt

Thanks for making this PR.

I believe we won't re-invoke the subnet discovery for existing ALBs(if we do, it's a bug and we should fix it). So this change will only impact new ALBs only, which wont suit your requirement per my understanding.

i feel a preferred approach is combine https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2354 with a controller-level flag to set default nodeSelector. (so you can change the controller-level default nodeSelector if you don't set the nodeSelector per Ingress)

M00nF1sh avatar Mar 03 '22 00:03 M00nF1sh

@M00nF1sh Thanks for the comments.
Couple questions, about the #2354

  1. How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.
  2. Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?

shaoxt avatar Mar 07 '22 06:03 shaoxt

@shaoxt

How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.

  1. i think the solution is to expose a controller-level flag(can be IngressClass level as well), which sets the nodeSelector, thus once the controller is restarted, all Ingresses will be reconciled.

Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?

  1. i prefer a external source (e.g. gitops tool that manages the controller deployment) to do the az mapping, which translates into the nodeSelector argument into the controller. (this way will make the controller itself agnostic to the detailed nodeSelector).
  2. alternatively, there can be a special flag such as "--backlisted-az-ids", and the controller reads it and translates it into the corresponding topology.kubernetes.io/zone nodeSelector per account internally. but this couples the controller with the specific topology.kubernetes.io/zone key.

M00nF1sh avatar Mar 09 '22 22:03 M00nF1sh

@M00nF1sh Like the idea of "-backlisted-az-ids"


> @shaoxt
> 
> > How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.
> 
> 1. i think the solution is to expose a controller-level flag(can be IngressClass level as well), which sets the nodeSelector, thus once the controller is restarted, all Ingresses will be reconciled.
> 
> > Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?
> 
> 1. i prefer a external source (e.g. gitops tool that manages the controller deployment) to do the az mapping, which translates into the nodeSelector argument into the controller. (this way will make the controller itself agnostic to the detailed nodeSelector).
> 2. alternatively, there can be a special flag such as "--backlisted-az-ids", and the controller reads it and translates it into the corresponding `topology.kubernetes.io/zone` nodeSelector per account internally. but this couples the controller with the specific `topology.kubernetes.io/zone` key.

shaoxt avatar Mar 22 '22 21:03 shaoxt

@shaoxt: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar May 11 '22 11:05 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 09 '22 11:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 08 '22 12:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Oct 08 '22 13:10 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 08 '22 13:10 k8s-ci-robot