aws-load-balancer-controller Conflicting SSL policies on ingresses in an ingress group should be a warning rather than an error

Describe the bug I'm unsure if this is a bug or an intentional design decision (and so wanting of a feature request to change the behaviour instead) but I thought I'd raise it as a bug because I have steps to reproduce the issue.

If two ingresses share the same ingress group and set conflicting SSL policies via the alb.ingress.kubernetes.io/ssl-policy annotation, the load balancer controller begins throwing errors and is unable to manage the ingress group (no new ingresses created, updated or even deleted) until the SSL policies are aligned across all of the ingresses.

Steps to reproduce

Setup a cluster with the ALB controller configured and without using the sslPolicy ingress class parameter
Apply the following manifests in order to set up a deployment and service

apiVersion: v1
kind: Namespace
metadata:
  labels:
    kubernetes.io/metadata.name: load-balancer-controller-test
    name: load-balancer-controller-test
  name: load-balancer-controller-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: load-balancer-controller-test
  labels:
    app.kubernetes.io/name: load-balancer-controller-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: load-balancer-controller-test
  template:
    metadata:
      labels:
        app.kubernetes.io/name: load-balancer-controller-test
    spec:
      containers:
      - name: http-echo-8080
        image: hashicorp/http-echo
        command: ["/http-echo", "-listen=:8080", "-text='hello on 8080'"]
        ports:
          - containerPort: 8080
            name: http
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: load-balancer-controller-test
  name: load-balancer-controller-test
  namespace: load-balancer-controller-test
spec:
  type: NodePort
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: load-balancer-controller-test

Apply this ingress manifest to create an ALB with the default SSL policy of ELBSecurityPolicy-2016-08:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/ssl-redirect: "443"
  labels:
    app.kubernetes.io/name: load-balancer-controller-test-a
  name: load-balancer-controller-test-a
  namespace: load-balancer-controller-test
spec:
  ingressClassName: alb
  rules:
  - host: load-balancer-controller-test-a.example.com # TODO: Replace this with a domain you control
    http:
      paths:
      - backend:
          service:
            name: load-balancer-controller-test
            port:
              number: 8080
        path: /
        pathType: Prefix

Apply this ingress manifest to add a target group to the ALB and set the SSL policy of ELBSecurityPolicy-TLS13-1-2-Res-2021-06:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-Res-2021-06
  labels:
    app.kubernetes.io/name: load-balancer-controller-test-b
  name: load-balancer-controller-test-b
  namespace: load-balancer-controller-test
spec:
  ingressClassName: alb
  rules:
  - host: load-balancer-controller-test-b.example.com # TODO: Replace this with a domain you control
    http:
      paths:
      - backend:
          service:
            name: load-balancer-controller-test
            port:
              number: 8080
        path: /
        pathType: Prefix

Apply this ingress manifest with a different SSL policy set to see the AWS load balancer controller begin to throw errors and be stuck in a broken state and observe that the ingress is not created and a target group is not created:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
  labels:
    app.kubernetes.io/name: load-balancer-controller-test-a
  name: load-balancer-controller-test-c
  namespace: load-balancer-controller-test
spec:
  ingressClassName: alb
  rules:
  - host: load-balancer-controller-test-c.example.com # TODO: Replace this with a domain you control
    http:
      paths:
      - backend:
          service:
            name: load-balancer-controller-test
            port:
              number: 8080
        path: /
        pathType: Prefix

Apply this ingress manifest to see that other ingresses that match the SSL policy of the ALB are also not created:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-Res-2021-06
  labels:
    app.kubernetes.io/name: load-balancer-controller-test-d
  name: load-balancer-controller-test-d
  namespace: load-balancer-controller-test
spec:
  ingressClassName: alb
  rules:
  - host: load-balancer-controller-test-d.example.com # TODO: Replace this with a domain you control
    http:
      paths:
      - backend:
          service:
            name: load-balancer-controller-test
            port:
              number: 8080
        path: /
        pathType: Prefix

You can also swap step 5 for patching the first ingress with kubectl patch ingress -n load-balancer-controller-test load-balancer-controller-test-a --patch-file ingress-a-patch.yml and this patch to observe the same behaviour when an ingress in the group is updated to a mismatched SSL policy:

metadata:
  annotations:
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10

The logs thrown look like this:

{"level":"error","ts":"2023-06-28T13:10:16Z","msg":"Reconciler error","controller":"ingress","object":{"name":"load-balancer-controller-test"},"namespace":"","name":"load-balancer-controller-test","reconcileID":"571718e7-ddbd-4177-a9e8-05cdd4caa694","error":"failed to merge listenPort config for port: 443: conflicting sslPolicy, load-balancer-controller-test/load-balancer-controller-test-b: ELBSecurityPolicy-FS-1-2-Res-2020-10 | load-balancer-controller-test/load-balancer-controller-test-b: ELBSecurityPolicy-TLS13-1-2-Res-2021-06"}

Other updates also fail to apply to the ingresses in the ingress group such as this health check update applied with kubectl patch ingress -n load-balancer-controller-test load-balancer-controller-test-b --patch-file ingress-b-patch.yml:

metadata:
  annotations:
    alb.ingress.kubernetes.io/healthcheck-path: /foo

If you either remove the ingress with the mismatched SSL policy, remove the SSL policy annotation from the ingress, or align them then the AWS load balancer controller will resume managing changes to these ingresses in the group.

Alternatively, if you set the sslPolicy ingress class parameter then the AWS load balancer controller will ignore the alb.ingress.kubernetes.io/ssl-policy annotation and just set all the load balancers of that ingress class to use the SSL policy defined there.

Expected outcome I'd expect to see warnings in the logs so you can see that they're mismatched and the ALB controller can't merge these (although the current log line as shown above doesn't help you easily spot the issue because the same ingress name is used multiple times - that's covered in https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3125) but otherwise the ALB load balancer controller should continue to manage the ingress group as normal.

Environment

AWS Load Balancer controller version v2.5.3 via the 1.5.4 Helm chart
Kubernetes version 1.24
Using EKS (yes/no), if so version? 1.24

Additional Context: This isn't the worst thing in the world because we're now changing tack and introducing the SSL policy via the ingress class instead. However, it did cause us to spend a lot of time head scratching why ingresses weren't being created or updated after we updated some of the ingresses in an ingress group and raised support ticket 13031568301.

Right now, it's not particularly feasible to use the alb.ingress.kubernetes.io/ssl-policy annotation with ingress groups because you leave the ingresses in the group in an unmanaged state until you can roll out the changes to all of the ingresses in the group.

Jun 28 '23 13:06 tomelliff

@tomelliff, thanks for the details. It's an expected behavior that the controller error out when there are multiple ssl policies on ingresses under the same ingress group, since for one ingress group there will only be one ALB.

Jul 05 '23 16:07 oliviassss

Yep, I was expecting there to be at least a warning that it can't handle that because it's not something the ALB can merge (you only have one SSL policy per ALB) but I didn't expect for the load balancer controller to just error continually and be unable to manage any of the ingresses in the ingress group until the issue is resolved by removing the conflicting ingress or aligning all of the ingresses' SSL policies.

Is there a reason it has to be a hard error instead of a warning? If you just logged a warning then it would explain why the SSL policy hadn't been updated but the ingresses would carry on being managed and updated and then when all of the ingresses in the group were aligned on the SSL policy it would be set to that and stop logging warnings.

Jul 06 '23 17:07 tomelliff

@tomelliff, yeah I agree there could be a potential improvement. For example, if conflicting ssl policies specified in various ingresses under the same ingress group, the controller could have a warning instead of erroring out, and could select an arbitrary policy to the ALB provisioned (by lexical order, or just pick the latest policy).

Jul 06 '23 18:07 oliviassss

I would say the situation is correctly an error. Selecting an arbitrary policy would be worse.

If you want to avoid such errors, you can lock down the policy in an IngressClassParams.

Jul 10 '23 05:07 johngmyers

I wouldn't suggest an arbitrary policy but to keep the current policy on the ALB until all ingresses in the ingress groups are aligned with each other and then if that's different to what the ALB has then change the ALB to match.

Until they're aligned the controller could still warn to say that it's not changing it because some of the ingresses conflict.

And yep, as I mentioned above, we went with an ingress class parameter instead to set things but it was very confusing because even looking at the logs from the controller at the time I assumed it wouldn't cause the ingresses in the group to go unmanaged until it was fixed so I spent a bunch of time working out why new ingresses weren't created in the group and also why the healthcheck path for a target group wasn't being set while there was a discrepancy on the SSL policies between ingresses in the ingress group.

Right now it feels like a bit of a footgun that can be confusing to users because there's nothing that states that having a mismatched SSL policy across ingresses in an ingress group causes that behaviour of it just not doing anything with them. You could document that and flag that it's a bad idea to set alb.ingress.kubernetes.io/ssl-policy when using ingress groups (I'm not sure if anything else with unmergeable annotations triggers this behaviour) but to me it feels like a better idea would be to still carry on managing the ingress as normal and make other changes as they come up but only change the SSL policy on the ALB when they're all set to the same.

Jul 13 '23 17:07 tomelliff

I wouldn't suggest an arbitrary policy but to keep the current policy on the ALB until all ingresses in the ingress groups are aligned with each other and then if that's different to what the ALB has then change the ALB to match.

That would add a nontrivial amount of complexity to the controller. There's the question of what to do if there is no existing ALB. And it will confuse people, especially if the policy on the ALB doesn't match any of the policies on Ingress annotations.

If the sslPolicy field of IngressClassParams is set, then the annotations on the Ingresses in that IngressClass are ignored.

Jul 14 '23 00:07 johngmyers

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 25 '24 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 24 '24 12:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 25 '24 12:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 25 '24 13:03 k8s-ci-robot

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Conflicting SSL policies on ingresses in an ingress group should be a warning rather than an error

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard