aws-load-balancer-controller Skip processing an ingress that does not have a valid certificate for its hosts

Issue

N/A (I don't see any related issues, but feel free to let me know and I'll link them here)

Description

If a single ALB ingress in an ingress group includes a host that doesn't have a corresponding certificate in ACM, it will block successful deployment of the entire ingress group. Instead of this behavior, we can simply ignore the "bad ingress" and include log a message explaining that the ingress doesn't have a host that corresponds to a certificate.

With the recent introduction of the fix for the syncPeriod parameter, I think it's safe to make this change because the load balancer controller will eventually try to perform certificate discovery and attachment after at most 10 hours (the default period), so if the "bad ingress" becomes an ingress that has a valid certificate for its hosts, it would eventually get processed successfully. Additionally, users can make the syncPeriod more frequent if needed.

Checklist

[ ] Added tests that cover your change (if possible)
[ ] Added/modified documentation as required (such as the README.md, or the docs directory)
[x] Manually tested
- Integration testing
  - Test Case 1: Multiple ingresses in one group
    - ALB Ingress with no valid cert for host
    - ALB ingress with valid cert for host
    - Bad ingress is ignored and good ingress is successfully processed and model is deployed
  - Test Case 2: Single ingress in a group
    - ALB Ingress with no valid cert for host
    - Load balancer is constructed/model deployed without invalid port config
[x] Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

[ ] Backfilled missing tests for code in same general area :tada:
[ ] Refactored something and made the world a better place :star2:

Jul 26 '23 16:07 seanseth7

The committers listed above are authorized under a signed CLA.

:white_check_mark: login: seanseth7 / name: Sean Seth (4955e05bdb7c77bd4ff7866783967cc302dd79f5)

Jul 26 '23 16:07 linux-foundation-easycla[bot]

Hi @seanseth7. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jul 26 '23 16:07 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: seanseth7 Once this PR has been reviewed and has the lgtm label, please assign m00nf1sh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 26 '23 16:07 k8s-ci-robot

I'm currently in the process of getting the CLA approved by my org's admin

Jul 26 '23 16:07 seanseth7

I don't think it's okay to just skip over the ingress. This could cause the ALB to be torn down if its cert gets deleted.

The right way to address this is through the ingress admission controller. I'm not going to be reviewing ingress admission PRs until #3026 lands, though.

Jul 26 '23 16:07 johngmyers

I don't think it's okay to just skip over the ingress. This could cause the ALB to be torn down if its cert gets deleted.

The right way to address this is through the ingress admission controller. I'm not going to be reviewing ingress admission PRs until #3026 lands, though.

@johngmyers Do you mind clarifying how the ALB would get torn down? I was under the impression that if there are no valid certs for the HTTPS listener, that listener would simply not be configured for the ALB.

Ah I see. Just to give you more context on why I was trying to make this change, my team was running into issues whenever a certificate expires for an ingress in an ingress group. The controller currently just gets into an error state because no cert exists for an ingress' hosts and doesn't successfully process any new ingresses in that group, so I was just trying to find the simplest approach to getting around that issue.

Also, is that ingress admission controller something that still needs to be built? Do you mind linking the code if it already exists? I know that this issue https://github.com/kubernetes-sigs/aws-load-balancer-controller/projects/11#card-88834664 was slated for the 2.6.0 release, which seems to solve my team's issue as well, and if you still think that's accurate, do you have an idea as to an ETA for that release? All good if not.

Thanks!

Jul 26 '23 16:07 seanseth7

The admission controller is part of LBC. I cannot predict when things would be released.

Jul 26 '23 16:07 johngmyers

The admission controller is part of LBC. I cannot predict when things would be released.

@johngmyers Got it. Thanks for clarifying.

Jul 26 '23 16:07 seanseth7

@johngmyers I took some more time to look at the PR you mentioned https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/3026 , and just to be sure that we're on the same page, you think it's best to simply update that ingress validator to add another check for if an ingress has a valid certificate in ACM for its hosts, correct? I'm happy to draft up another PR based on the changes you already made in that PR with new certificate validation logic if you agree.

Jul 27 '23 18:07 seanseth7

@seanseth7 yes, that is the shape of it. I haven't explored the edge cases, such as what happens if the cert expires after the LB is created.

Jul 27 '23 22:07 johngmyers

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 20 '24 19:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 19 '24 19:02 k8s-triage-robot

Upvote, as we would be needing this as well.

However, I think it should be configurable with an annotation, for example alb.ingress.kubernetes.io/tls-autodiscovery-strategy which would default to match-all which would enforce all the hosts to have same certificate, and to enable this, it would have match-any strategy available too.

Mar 06 '24 13:03 hmehta

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Apr 13 '24 03:04 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 13 '24 03:04 k8s-ci-robot

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Skip processing an ingress that does not have a valid certificate for its hosts

Issue

Description

Checklist

BONUS POINTS checklist: complete for good vibes and maybe prizes?! :exploding_head:

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard