aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard
Conflicting SSL policies on ingresses in an ingress group should be a warning rather than an error
Describe the bug I'm unsure if this is a bug or an intentional design decision (and so wanting of a feature request to change the behaviour instead) but I thought I'd raise it as a bug because I have steps to reproduce the issue.
If two ingresses share the same ingress group and set conflicting SSL policies via the alb.ingress.kubernetes.io/ssl-policy
annotation, the load balancer controller begins throwing errors and is unable to manage the ingress group (no new ingresses created, updated or even deleted) until the SSL policies are aligned across all of the ingresses.
Steps to reproduce
- Setup a cluster with the ALB controller configured and without using the
sslPolicy
ingress class parameter - Apply the following manifests in order to set up a deployment and service
apiVersion: v1
kind: Namespace
metadata:
labels:
kubernetes.io/metadata.name: load-balancer-controller-test
name: load-balancer-controller-test
name: load-balancer-controller-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: load-balancer-controller-test
labels:
app.kubernetes.io/name: load-balancer-controller-test
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: load-balancer-controller-test
template:
metadata:
labels:
app.kubernetes.io/name: load-balancer-controller-test
spec:
containers:
- name: http-echo-8080
image: hashicorp/http-echo
command: ["/http-echo", "-listen=:8080", "-text='hello on 8080'"]
ports:
- containerPort: 8080
name: http
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: load-balancer-controller-test
name: load-balancer-controller-test
namespace: load-balancer-controller-test
spec:
type: NodePort
ports:
- name: http
port: 8080
protocol: TCP
targetPort: http
selector:
app.kubernetes.io/name: load-balancer-controller-test
- Apply this ingress manifest to create an ALB with the default SSL policy of
ELBSecurityPolicy-2016-08
:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/ssl-redirect: "443"
labels:
app.kubernetes.io/name: load-balancer-controller-test-a
name: load-balancer-controller-test-a
namespace: load-balancer-controller-test
spec:
ingressClassName: alb
rules:
- host: load-balancer-controller-test-a.example.com # TODO: Replace this with a domain you control
http:
paths:
- backend:
service:
name: load-balancer-controller-test
port:
number: 8080
path: /
pathType: Prefix
- Apply this ingress manifest to add a target group to the ALB and set the SSL policy of
ELBSecurityPolicy-TLS13-1-2-Res-2021-06
:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-Res-2021-06
labels:
app.kubernetes.io/name: load-balancer-controller-test-b
name: load-balancer-controller-test-b
namespace: load-balancer-controller-test
spec:
ingressClassName: alb
rules:
- host: load-balancer-controller-test-b.example.com # TODO: Replace this with a domain you control
http:
paths:
- backend:
service:
name: load-balancer-controller-test
port:
number: 8080
path: /
pathType: Prefix
- Apply this ingress manifest with a different SSL policy set to see the AWS load balancer controller begin to throw errors and be stuck in a broken state and observe that the ingress is not created and a target group is not created:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
labels:
app.kubernetes.io/name: load-balancer-controller-test-a
name: load-balancer-controller-test-c
namespace: load-balancer-controller-test
spec:
ingressClassName: alb
rules:
- host: load-balancer-controller-test-c.example.com # TODO: Replace this with a domain you control
http:
paths:
- backend:
service:
name: load-balancer-controller-test
port:
number: 8080
path: /
pathType: Prefix
- Apply this ingress manifest to see that other ingresses that match the SSL policy of the ALB are also not created:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/group.name: load-balancer-controller-test
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-Res-2021-06
labels:
app.kubernetes.io/name: load-balancer-controller-test-d
name: load-balancer-controller-test-d
namespace: load-balancer-controller-test
spec:
ingressClassName: alb
rules:
- host: load-balancer-controller-test-d.example.com # TODO: Replace this with a domain you control
http:
paths:
- backend:
service:
name: load-balancer-controller-test
port:
number: 8080
path: /
pathType: Prefix
You can also swap step 5 for patching the first ingress with kubectl patch ingress -n load-balancer-controller-test load-balancer-controller-test-a --patch-file ingress-a-patch.yml
and this patch to observe the same behaviour when an ingress in the group is updated to a mismatched SSL policy:
metadata:
annotations:
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
The logs thrown look like this:
{"level":"error","ts":"2023-06-28T13:10:16Z","msg":"Reconciler error","controller":"ingress","object":{"name":"load-balancer-controller-test"},"namespace":"","name":"load-balancer-controller-test","reconcileID":"571718e7-ddbd-4177-a9e8-05cdd4caa694","error":"failed to merge listenPort config for port: 443: conflicting sslPolicy, load-balancer-controller-test/load-balancer-controller-test-b: ELBSecurityPolicy-FS-1-2-Res-2020-10 | load-balancer-controller-test/load-balancer-controller-test-b: ELBSecurityPolicy-TLS13-1-2-Res-2021-06"}
Other updates also fail to apply to the ingresses in the ingress group such as this health check update applied with kubectl patch ingress -n load-balancer-controller-test load-balancer-controller-test-b --patch-file ingress-b-patch.yml
:
metadata:
annotations:
alb.ingress.kubernetes.io/healthcheck-path: /foo
If you either remove the ingress with the mismatched SSL policy, remove the SSL policy annotation from the ingress, or align them then the AWS load balancer controller will resume managing changes to these ingresses in the group.
Alternatively, if you set the sslPolicy
ingress class parameter then the AWS load balancer controller will ignore the alb.ingress.kubernetes.io/ssl-policy
annotation and just set all the load balancers of that ingress class to use the SSL policy defined there.
Expected outcome I'd expect to see warnings in the logs so you can see that they're mismatched and the ALB controller can't merge these (although the current log line as shown above doesn't help you easily spot the issue because the same ingress name is used multiple times - that's covered in https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/3125) but otherwise the ALB load balancer controller should continue to manage the ingress group as normal.
Environment
- AWS Load Balancer controller version v2.5.3 via the 1.5.4 Helm chart
- Kubernetes version 1.24
- Using EKS (yes/no), if so version? 1.24
Additional Context: This isn't the worst thing in the world because we're now changing tack and introducing the SSL policy via the ingress class instead. However, it did cause us to spend a lot of time head scratching why ingresses weren't being created or updated after we updated some of the ingresses in an ingress group and raised support ticket 13031568301.
Right now, it's not particularly feasible to use the alb.ingress.kubernetes.io/ssl-policy
annotation with ingress groups because you leave the ingresses in the group in an unmanaged state until you can roll out the changes to all of the ingresses in the group.
@tomelliff, thanks for the details. It's an expected behavior that the controller error out when there are multiple ssl policies on ingresses under the same ingress group, since for one ingress group there will only be one ALB.
Yep, I was expecting there to be at least a warning that it can't handle that because it's not something the ALB can merge (you only have one SSL policy per ALB) but I didn't expect for the load balancer controller to just error continually and be unable to manage any of the ingresses in the ingress group until the issue is resolved by removing the conflicting ingress or aligning all of the ingresses' SSL policies.
Is there a reason it has to be a hard error instead of a warning? If you just logged a warning then it would explain why the SSL policy hadn't been updated but the ingresses would carry on being managed and updated and then when all of the ingresses in the group were aligned on the SSL policy it would be set to that and stop logging warnings.
@tomelliff, yeah I agree there could be a potential improvement. For example, if conflicting ssl policies specified in various ingresses under the same ingress group, the controller could have a warning instead of erroring out, and could select an arbitrary policy to the ALB provisioned (by lexical order, or just pick the latest policy).
I would say the situation is correctly an error. Selecting an arbitrary policy would be worse.
If you want to avoid such errors, you can lock down the policy in an IngressClassParams
.
I wouldn't suggest an arbitrary policy but to keep the current policy on the ALB until all ingresses in the ingress groups are aligned with each other and then if that's different to what the ALB has then change the ALB to match.
Until they're aligned the controller could still warn to say that it's not changing it because some of the ingresses conflict.
And yep, as I mentioned above, we went with an ingress class parameter instead to set things but it was very confusing because even looking at the logs from the controller at the time I assumed it wouldn't cause the ingresses in the group to go unmanaged until it was fixed so I spent a bunch of time working out why new ingresses weren't created in the group and also why the healthcheck path for a target group wasn't being set while there was a discrepancy on the SSL policies between ingresses in the ingress group.
Right now it feels like a bit of a footgun that can be confusing to users because there's nothing that states that having a mismatched SSL policy across ingresses in an ingress group causes that behaviour of it just not doing anything with them. You could document that and flag that it's a bad idea to set alb.ingress.kubernetes.io/ssl-policy
when using ingress groups (I'm not sure if anything else with unmergeable annotations triggers this behaviour) but to me it feels like a better idea would be to still carry on managing the ingress as normal and make other changes as they come up but only change the SSL policy on the ALB when they're all set to the same.
I wouldn't suggest an arbitrary policy but to keep the current policy on the ALB until all ingresses in the ingress groups are aligned with each other and then if that's different to what the ALB has then change the ALB to match.
That would add a nontrivial amount of complexity to the controller. There's the question of what to do if there is no existing ALB. And it will confuse people, especially if the policy on the ALB doesn't match any of the policies on Ingress annotations.
If the sslPolicy
field of IngressClassParams
is set, then the annotations on the Ingresses
in that IngressClass
are ignored.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.