Desired available zone ids
Issue
PR for #2496
Description
The proposal is to have an argument "--desired-availability-zone-ids" on aws-load-balancer-controller, so that the controller can pickup the targets within the desired AZs. This can switch the entire cluster as quick as possible.
Sample:
--desired-availability-zone-ids=usw2-az2,usw2-az3 This argument will limit the subnets specified on the annotation of Ingress. The argument won't limit any subnets if it doesn't specified anything. It won't impact anything by default.
alb.ingress.kubernetes.io/subnets: IngressSubnetAz1, IngressSubnetAz2, IngressSubnetAz3 Even the ingress specified 3 subnets, but the targets NOT in the desired availability zones won't be used in the targetGroup. The feature provides a lever to control the traffic out of one AZ quickly.
In SubnetResolver, it consumes the desired availability zone ids, so that it limit the subnets be selected in this given list.
Checklist
- [x] Added tests that cover your change (if possible)
- [x] Added/modified documentation as required (such as the
README.md, or thedocsdirectory) - [x] Manually tested
- [x] Made sure the title of the PR is a good description that can go into the release notes
BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:
- [ ] Backfilled missing tests for code in same general area :tada:
- [ ] Refactored something and made the world a better place :star2:
The committers are authorized under a signed CLA.
- :white_check_mark: Sheldon Shao (b27a593f20f38133f23fd6bf58dd1ba5825c29b7, 5e1e08bcfbc6e7c19660b02b4139df33e012ac11)
Welcome @shaoxt!
It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @shaoxt. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: shaoxt
To complete the pull request process, please assign m00nf1sh after the PR has been reviewed.
You can assign the PR to them by writing /assign @m00nf1sh in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Going through the checklist
Thanks for making this PR.
I believe we won't re-invoke the subnet discovery for existing ALBs(if we do, it's a bug and we should fix it). So this change will only impact new ALBs only, which wont suit your requirement per my understanding.
i feel a preferred approach is combine https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2354 with a controller-level flag to set default nodeSelector. (so you can change the controller-level default nodeSelector if you don't set the nodeSelector per Ingress)
@M00nF1sh Thanks for the comments.
Couple questions, about the #2354
- How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.
- Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?
@shaoxt
How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.
- i think the solution is to expose a controller-level flag(can be IngressClass level as well), which sets the nodeSelector, thus once the controller is restarted, all Ingresses will be reconciled.
Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?
- i prefer a external source (e.g. gitops tool that manages the controller deployment) to do the az mapping, which translates into the nodeSelector argument into the controller. (this way will make the controller itself agnostic to the detailed nodeSelector).
- alternatively, there can be a special flag such as "--backlisted-az-ids", and the controller reads it and translates it into the corresponding
topology.kubernetes.io/zonenodeSelector per account internally. but this couples the controller with the specifictopology.kubernetes.io/zonekey.
@M00nF1sh Like the idea of "-backlisted-az-ids"
> @shaoxt
>
> > How are you going to sync the bad AZ information from one place to all ingresses ? if you have couple hundreds of Ingresses in one cluster.
>
> 1. i think the solution is to expose a controller-level flag(can be IngressClass level as well), which sets the nodeSelector, thus once the controller is restarted, all Ingresses will be reconciled.
>
> > Since the "topology.kubernetes.io/zone" points the bad AZ name in each aws account. How are you going to deals with 200 aws accounts when the physical AZ ID went bad, but it has different AZ names in different aws accounts ?
>
> 1. i prefer a external source (e.g. gitops tool that manages the controller deployment) to do the az mapping, which translates into the nodeSelector argument into the controller. (this way will make the controller itself agnostic to the detailed nodeSelector).
> 2. alternatively, there can be a special flag such as "--backlisted-az-ids", and the controller reads it and translates it into the corresponding `topology.kubernetes.io/zone` nodeSelector per account internally. but this couples the controller with the specific `topology.kubernetes.io/zone` key.
@shaoxt: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.