cloud-provider-azure
cloud-provider-azure copied to clipboard
[WIP] Add support for specifying probe protocol / probe port via annotation per service port
What type of PR is this?
/kind feature
What this PR does / why we need it:
This adds service.beta.kubernetes.io/port_<num>_health-probe_protocol and service.beta.kubernetes.io/port_<num>_health-probe_port annotations for LoadBalancer Services. These allow overriding the health check port or protocol for a given port. This is useful for ports that cannot respond affirmatively to the health checks and expose health information elsewhere.
Which issue(s) this PR fixes:
Fixes #1505
Special notes for your reviewer:
This continues https://github.com/kubernetes-sigs/cloud-provider-azure/pull/1508 and attempts to address the open issue with it. I rebased onto upstream master and added commits to de-duplicate checks and correct an apparent error in the original docs (/ is the default check path, not /healthz).
I'm not quite sure I understand the duplication concern raised in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/1508#pullrequestreview-944084110, and it looks like the line indicated is maybe outdated. As I read it, we pass the rule name (for internal and external LBs) to the check generator, which then just uses that string as-is for the probe name.
That name's generator will return something like SERVICE_UID-??-tcp-443 (the service, subnet, appProtocol, and port--not sure about the second's appearance, but it's optional anyway). That doesn't seem like it can conflict because there will only be one probe per LB rule name, and the the probes are named for the rule they check, not the port they check.
For example, if we have a Service with HTTP appProtocol ports on 8443 and 8009, and want to override the former to use an HTTP check on 8009 instead, the resulting probes would be:
ABC-123-http-8443(an HTTP check on port 8009)ABC-123-http-8009(an HTTP check on port 8009)
Based on the discussion after, my additional change looks at the generated probes and only keeps those with unique properties, so that the above would only generate one probe. Naming is still a bit iffy, since you'll still get a probe named after only one of the rules, even though it's pulling double-duty. What name scheme would be desirable (changing it requires updating a bunch of tests, so I wanted to confirm before writing it)? I was thinking either ABC-123-http-8009-8443 (it indicates all ports it checks) or ABC-123-http-8009-10-5-/, indicating all of the probe properties. The latter doesn't work great, however, since it needs to account for HTTP paths, which could get quite unwieldy. It could indicate the port and protocol only, but that would result in name conflicts.
Does this PR introduce a user-facing change?
Added: support for new annotations **service.beta.kubernetes.io/port_<num>_health-probe_protocol** and **service.beta.kubernetes.io/port_<num>_health-probe_port** to allow explicitly setting the health probe protocol individually for each service port. Useful for services like Istio which have health check seperate from the main service port.
Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.
| Name | Link |
|---|---|
| Latest commit | ad38bedc53d2f6b661b05d4df39d8aeb116ea683 |
| Latest deploy log | https://app.netlify.com/sites/kubernetes-sigs-cloud-provide-azure/deploys/636a1db8c5ba4f00089a6182 |
Welcome @rainest!
It looks like this is your first PR to kubernetes-sigs/cloud-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/cloud-provider-azure has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @rainest. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@rainest Thanks for your PR! Could you please merge the commit into one commit?
Coverage increased (+0.03%) to 79.668% when pulling ad38bedc53d2f6b661b05d4df39d8aeb116ea683 on rainest:feature/azure-load-balancer-per-port-protocol into e8e8bec3f690bba0e47062bade279a687a694b6d on kubernetes-sigs:master.
/retest
@MartinForReal I can once it's complete, yes. Did you have comments on the de-duplication/naming question and any further work needed there? Or is the change introduced in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/2452/commits/7f80566de8e76f6295f5a09c92012a731f6708d6 all of what you were looking for?
Yes! That's what I'm looking for.
Alright, sounds good, squashed and rebased onto the current tip of master here.
/retest
LGTM. /assign @feiskyer
please rebase the branch in order to fix error reported by ci pipeline.
FYI I'll be out away from internet access through the 25th, so if this needs further changes I'll address them then.
@MartinForReal checking in on this, are there any further changes it would need?
I can't think of any, @feiskyer @nilo19 Could you please take a look?
/lgtm
/test pull-cloud-provider-azure-e2e-ccm-vmss-ip-lb-capz
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: MartinForReal, rainest
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [MartinForReal]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/ok-to-test
/retest
Unsure what would have broken the one test for service.beta.kubernetes.io/azure-load-balancer-resource-group. The Service never receives an IP assignment. Other tests that rely on the same check do get a Service LB IP and succeed, and AFAIK the PR shouldn't do anything special when a resource group is set.
Is there information elsewhere in the test logs that would show this particular Service's state and/or cloud provider logs related to assigning it an IP? I'm unfortunately not too familiar with test suite output, but it looks like it might only be logs from the test client.
/retest
@rainest: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-cloud-provider-azure-e2e-ccm-capz | ad38bedc53d2f6b661b05d4df39d8aeb116ea683 | link | true | /test pull-cloud-provider-azure-e2e-ccm-capz |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
/test pull-cloud-provider-azure-e2e-ccm-capz
/lgtm
/test pull-cloud-provider-azure-e2e-ccm-capz
@MartinForReal thanks for the help with the tests. Do you know roughly when changes here propagate out to being available in Azure for users? I didn't find any scheduled update lifecycle in the SIG FAQ, but it looks like new minor provider releases go out roughly every 6mo and are incorporated into Azure shortly after from reviewing the previous releases.
/cherrypick release-1.25
/cherrypick release-1.24
/cherrypick release-1.23