cloud-provider-azure
cloud-provider-azure copied to clipboard
Ingress can suddenly break health check of the LoadBalancer
What happened:
I created cluster and ingress as per https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/ingress-internal-ip.md It used to work before 1.22.6.
On 1.22.6 ingress not working because the health check protocol is HTTP and is located on /healthz on 80.
But we have path: /(.*) pathType: Prefix backend: service: name: aks-helloworld
that breaks /healthz since we have aks-helloworld on /
That basically means that any user with enough permissions to add ingress can break the whole cluster.
Maybe TCP should be used instead of HTTP by default.
What you expected to happen:
The default configuration works reliably.
How to reproduce it (as minimally and precisely as possible):
Follow https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/ingress-internal-ip.md on 1.22.6 version.
- @MartinForReal to have a look
@rumatavz Please change AppProtocol to TCP in svc manifest. As documented here https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#custom-load-balancer-health-probe SLB probe is using http because AppProtocol in svc is http. FYI
I employed another workaround: I've added azure-load-balancer-health-probe-protocol. I don't have problems with this anymore.
But what I'm saying is the default configuration is dangerous and other users can find the whole cluster not working because of this issue.
Maybe I'm wrong, but the docs seem to suggest that there is a huge change in behaviour from 1.23 to 1.24, potentially amplifying this issue:
For clusters >1.24, spec.ports.appProtocol would be used as probe protocol and / would be used as default probe request path (service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path could be used to change to a different request path).
nginx-ingress for example runs into this issue in its default configuration for its plain HTTP port. When admins upgrade from Kubernetes 1.23 to 1.24, their nginx-ingress will suddenly stop working, no?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Maybe I'm wrong, but the docs seem to suggest that there is a huge change in behaviour from 1.23 to 1.24, potentially amplifying this issue:
For clusters >1.24, spec.ports.appProtocol would be used as probe protocol and / would be used as default probe request path (service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path could be used to change to a different request path).
nginx-ingress for example runs into this issue in its default configuration for its plain HTTP port. When admins upgrade from Kubernetes 1.23 to 1.24, their nginx-ingress will suddenly stop working, no?
I confirm our nginx-ingress broke this morning after auto update to 1.24... I had to add an annotations to configure the health path in helm for nginx-ingress
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.