cloud-provider-azure icon indicating copy to clipboard operation
cloud-provider-azure copied to clipboard

Ingress can suddenly break health check of the LoadBalancer

Open rumatavz opened this issue 2 years ago • 4 comments

What happened:

I created cluster and ingress as per https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/ingress-internal-ip.md It used to work before 1.22.6.

On 1.22.6 ingress not working because the health check protocol is HTTP and is located on /healthz on 80.

But we have path: /(.*) pathType: Prefix backend: service: name: aks-helloworld

that breaks /healthz since we have aks-helloworld on /

That basically means that any user with enough permissions to add ingress can break the whole cluster.

Maybe TCP should be used instead of HTTP by default.

What you expected to happen:

The default configuration works reliably.

How to reproduce it (as minimally and precisely as possible):

Follow https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/ingress-internal-ip.md on 1.22.6 version.

rumatavz avatar May 25 '22 07:05 rumatavz

  • @MartinForReal to have a look

feiskyer avatar May 26 '22 06:05 feiskyer

@rumatavz Please change AppProtocol to TCP in svc manifest. As documented here https://kubernetes-sigs.github.io/cloud-provider-azure/topics/loadbalancer/#custom-load-balancer-health-probe SLB probe is using http because AppProtocol in svc is http. FYI

MartinForReal avatar May 26 '22 08:05 MartinForReal

I employed another workaround: I've added azure-load-balancer-health-probe-protocol. I don't have problems with this anymore.

But what I'm saying is the default configuration is dangerous and other users can find the whole cluster not working because of this issue.

rumatavz avatar Jun 01 '22 11:06 rumatavz

Maybe I'm wrong, but the docs seem to suggest that there is a huge change in behaviour from 1.23 to 1.24, potentially amplifying this issue:

For clusters >1.24, spec.ports.appProtocol would be used as probe protocol and / would be used as default probe request path (service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path could be used to change to a different request path).

nginx-ingress for example runs into this issue in its default configuration for its plain HTTP port. When admins upgrade from Kubernetes 1.23 to 1.24, their nginx-ingress will suddenly stop working, no?

embik avatar Aug 08 '22 13:08 embik

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 06 '22 13:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 06 '22 14:12 k8s-triage-robot

Maybe I'm wrong, but the docs seem to suggest that there is a huge change in behaviour from 1.23 to 1.24, potentially amplifying this issue:

For clusters >1.24, spec.ports.appProtocol would be used as probe protocol and / would be used as default probe request path (service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path could be used to change to a different request path).

nginx-ingress for example runs into this issue in its default configuration for its plain HTTP port. When admins upgrade from Kubernetes 1.23 to 1.24, their nginx-ingress will suddenly stop working, no?

I confirm our nginx-ingress broke this morning after auto update to 1.24... I had to add an annotations to configure the health path in helm for nginx-ingress

Socolin avatar Dec 26 '22 19:12 Socolin

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 25 '23 19:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 25 '23 19:01 k8s-ci-robot