cloud-provider-azure icon indicating copy to clipboard operation
cloud-provider-azure copied to clipboard

[WIP] Add support for specifying probe protocol / probe port via annotation per service port

Open rainest opened this issue 3 years ago • 14 comments

What type of PR is this?

/kind feature

What this PR does / why we need it:

This adds service.beta.kubernetes.io/port_<num>_health-probe_protocol and service.beta.kubernetes.io/port_<num>_health-probe_port annotations for LoadBalancer Services. These allow overriding the health check port or protocol for a given port. This is useful for ports that cannot respond affirmatively to the health checks and expose health information elsewhere.

Which issue(s) this PR fixes:

Fixes #1505

Special notes for your reviewer:

This continues https://github.com/kubernetes-sigs/cloud-provider-azure/pull/1508 and attempts to address the open issue with it. I rebased onto upstream master and added commits to de-duplicate checks and correct an apparent error in the original docs (/ is the default check path, not /healthz).

I'm not quite sure I understand the duplication concern raised in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/1508#pullrequestreview-944084110, and it looks like the line indicated is maybe outdated. As I read it, we pass the rule name (for internal and external LBs) to the check generator, which then just uses that string as-is for the probe name.

That name's generator will return something like SERVICE_UID-??-tcp-443 (the service, subnet, appProtocol, and port--not sure about the second's appearance, but it's optional anyway). That doesn't seem like it can conflict because there will only be one probe per LB rule name, and the the probes are named for the rule they check, not the port they check.

For example, if we have a Service with HTTP appProtocol ports on 8443 and 8009, and want to override the former to use an HTTP check on 8009 instead, the resulting probes would be:

  • ABC-123-http-8443 (an HTTP check on port 8009)
  • ABC-123-http-8009 (an HTTP check on port 8009)

Based on the discussion after, my additional change looks at the generated probes and only keeps those with unique properties, so that the above would only generate one probe. Naming is still a bit iffy, since you'll still get a probe named after only one of the rules, even though it's pulling double-duty. What name scheme would be desirable (changing it requires updating a bunch of tests, so I wanted to confirm before writing it)? I was thinking either ABC-123-http-8009-8443 (it indicates all ports it checks) or ABC-123-http-8009-10-5-/, indicating all of the probe properties. The latter doesn't work great, however, since it needs to account for HTTP paths, which could get quite unwieldy. It could indicate the port and protocol only, but that would result in name conflicts.

Does this PR introduce a user-facing change?

Added: support for new annotations **service.beta.kubernetes.io/port_<num>_health-probe_protocol** and **service.beta.kubernetes.io/port_<num>_health-probe_port** to allow explicitly setting the health probe protocol individually for each service port. Useful for services like Istio which have health check seperate from the main service port.

rainest avatar Oct 06 '22 00:10 rainest

Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.

Name Link
Latest commit ad38bedc53d2f6b661b05d4df39d8aeb116ea683
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cloud-provide-azure/deploys/636a1db8c5ba4f00089a6182

netlify[bot] avatar Oct 06 '22 00:10 netlify[bot]

Welcome @rainest!

It looks like this is your first PR to kubernetes-sigs/cloud-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cloud-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Oct 06 '22 00:10 k8s-ci-robot

Hi @rainest. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 06 '22 00:10 k8s-ci-robot

@rainest Thanks for your PR! Could you please merge the commit into one commit?

MartinForReal avatar Oct 08 '22 02:10 MartinForReal

Coverage Status

Coverage increased (+0.03%) to 79.668% when pulling ad38bedc53d2f6b661b05d4df39d8aeb116ea683 on rainest:feature/azure-load-balancer-per-port-protocol into e8e8bec3f690bba0e47062bade279a687a694b6d on kubernetes-sigs:master.

coveralls avatar Oct 08 '22 03:10 coveralls

/retest

MartinForReal avatar Oct 08 '22 03:10 MartinForReal

@MartinForReal I can once it's complete, yes. Did you have comments on the de-duplication/naming question and any further work needed there? Or is the change introduced in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/2452/commits/7f80566de8e76f6295f5a09c92012a731f6708d6 all of what you were looking for?

rainest avatar Oct 10 '22 17:10 rainest

Yes! That's what I'm looking for.

MartinForReal avatar Oct 11 '22 02:10 MartinForReal

Alright, sounds good, squashed and rebased onto the current tip of master here.

rainest avatar Oct 11 '22 17:10 rainest

/retest

MartinForReal avatar Oct 12 '22 01:10 MartinForReal

LGTM. /assign @feiskyer

MartinForReal avatar Oct 12 '22 02:10 MartinForReal

please rebase the branch in order to fix error reported by ci pipeline.

MartinForReal avatar Oct 12 '22 05:10 MartinForReal

FYI I'll be out away from internet access through the 25th, so if this needs further changes I'll address them then.

rainest avatar Oct 14 '22 18:10 rainest

@MartinForReal checking in on this, are there any further changes it would need?

rainest avatar Oct 31 '22 21:10 rainest

I can't think of any, @feiskyer @nilo19 Could you please take a look?

MartinForReal avatar Nov 01 '22 01:11 MartinForReal

/lgtm

nilo19 avatar Nov 01 '22 05:11 nilo19

/test pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz


MartinForReal avatar Nov 01 '22 14:11 MartinForReal

/approve

MartinForReal avatar Nov 07 '22 09:11 MartinForReal

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MartinForReal, rainest

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Nov 07 '22 09:11 k8s-ci-robot

/ok-to-test

MartinForReal avatar Nov 07 '22 11:11 MartinForReal

/retest

rainest avatar Nov 07 '22 18:11 rainest

Unsure what would have broken the one test for service.beta.kubernetes.io/azure-load-balancer-resource-group. The Service never receives an IP assignment. Other tests that rely on the same check do get a Service LB IP and succeed, and AFAIK the PR shouldn't do anything special when a resource group is set.

Is there information elsewhere in the test logs that would show this particular Service's state and/or cloud provider logs related to assigning it an IP? I'm unfortunately not too familiar with test suite output, but it looks like it might only be logs from the test client.

rainest avatar Nov 08 '22 00:11 rainest

/retest

MartinForReal avatar Nov 09 '22 01:11 MartinForReal

@rainest: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-azure-e2e-ccm-capz ad38bedc53d2f6b661b05d4df39d8aeb116ea683 link true /test pull-cloud-provider-azure-e2e-ccm-capz

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Nov 09 '22 04:11 k8s-ci-robot

/test pull-cloud-provider-azure-e2e-ccm-capz


MartinForReal avatar Nov 09 '22 04:11 MartinForReal

/lgtm

MartinForReal avatar Nov 09 '22 04:11 MartinForReal

/test pull-cloud-provider-azure-e2e-ccm-capz


MartinForReal avatar Nov 11 '22 02:11 MartinForReal

@MartinForReal thanks for the help with the tests. Do you know roughly when changes here propagate out to being available in Azure for users? I didn't find any scheduled update lifecycle in the SIG FAQ, but it looks like new minor provider releases go out roughly every 6mo and are incorporated into Azure shortly after from reviewing the previous releases.

rainest avatar Nov 14 '22 18:11 rainest

/cherrypick release-1.25

MartinForReal avatar Nov 15 '22 01:11 MartinForReal

/cherrypick release-1.24

MartinForReal avatar Nov 15 '22 01:11 MartinForReal

/cherrypick release-1.23

MartinForReal avatar Nov 15 '22 01:11 MartinForReal