ingress-nginx
ingress-nginx copied to clipboard
Add new prometheus metric for orphaned ingress
What this PR does / why we need it:
Logging for ingresses with no endpoints or services available is already present:
- The service does not exist at all:
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.862371 9 controller.go:753] Error obtaining Endpoints for Service "namespace1/echoserver": no object matching key "namespace1/echoserver" in local store
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.864926 9 controller.go:753] Error obtaining Endpoints for Service "namespace2/service2-service": no object matching key "namespace2/service2-service" in local store
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.865429 9 controller.go:753] Error obtaining Endpoints for Service "namespace2/service3-service": no object matching key "namespace2/service3-service" in local store
- Ingresses which service does exist, but that does not have any active endpoints:
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 09:58:53.444455 9 controller.go:826] Service "namespace3/service4-service" does not have any active Endpoint.
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 09:58:53.444524 9 controller.go:826] Service "namespace4/service5-service" does not have any active Endpoint.
While following the cited logs is a possibility, they are rather "noisy" and only the "error case" causes a line. So one does not know when things since then have recovered (the ingress was reconfigured / deleted or endpoints are back up). This metric will help to control the orphaned ingresses
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation only
Which issue/s this PR fixes
fixes #4763
How Has This Been Tested?
Tested with unit tests and manual tests with make dev-env
cluster
Checklist:
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I've read the CONTRIBUTION guide
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
The committers are authorized under a signed CLA.
- :white_check_mark: Makhonin Alexey (3ec448ec15b4299a9cdc06ef22233d7e8be8e42e)
Welcome @alex123012!
It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/ingress-nginx has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @alex123012. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/easycla
/assign @rikatz
HI! This is my vision on this feature and and if such implementation, in your opinion, is incorrect, I will be happy to redo it :)
/ok-to-test thanks for the PR, let me take a look :)
/triage accepted /kind feature /priority important-longterm
Sorry, I can't understand, why test are not passing (can't figure out where to change TestGetBackendServers in controller_test.go). Can someone help me?
--- FAIL: TestGetBackendServers (1.10s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x107fdd4]
goroutine 103 [running]:
testing.tRunner.func1.2({0x1254bc0, 0x234b430})
/usr/local/go/src/testing/testing.go:1209 +0x258
testing.tRunner.func1(0x40002c41a0)
/usr/local/go/src/testing/testing.go:1212 +0x278
panic({0x1254bc0, 0x234b430})
/usr/local/go/src/runtime/panic.go:1038 +0x224
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).createUpstreams(0x40004761e0, {0x4000286228, 0x1, 0x1}, 0x40002f6a80)
/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1000 +0x1b74
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getBackendServers(0x40004761e0, {0x4000286228, 0x1, 0x1})
/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:590 +0x6c
k8s.io/ingress-nginx/internal/ingress/controller.TestGetBackendServers(0x40002c41a0)
/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller_test.go:2347 +0x290c
testing.tRunner(0x40002c41a0, 0x151c8a0)
/usr/local/go/src/testing/testing.go:1259 +0xf8
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:1306 +0x350
FAIL k8s.io/ingress-nginx/internal/ingress/controller 1.485s```
my bet is metricCollector is nil, but need to check why.
@alex123012 could you figure out the reason of the panic?
I'm sort of reviewing some PRs on my queue here, wanted to look into this but I'm without my home lab :/
Will try to cycle back during the week,
Okey, I'll try to do my best :)
Hi!
I found why metric collector is nil. this is because in test TestGetBackendServers
it calls newDynamicNginxController
function, which returns NGINXController
without metric collector defined. Then in test it calls getBackendServers
, which in turn calls function createUpstreams
where orphanity metric is collected.
I've added dummy collector to newDynamicNginxController
function and now it works fine.
Is this a good solution?
I don't think, that new metric collecting now is the cause of failing CI tests
@rikatz @iamNoah1 Could you take a look please? ^
hey, I will take a look into this next weekend
Adding this to my review spreadsheet :)
/retest /approve /hold We should merge this before freeze, or fix it.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alex123012, rikatz
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [rikatz]
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
@alex123012 I'm having some bad time to re-run here. Can you rebase based on our latest tests, we fixed a bunch of stuff recently
@rikatz I've rebased my branch on main, but tests continue to fail. IMHO errors aren't connected with these feature
/retest
/remove hold
/remove-hold
/ok-to-test
@rikatz @iamNoah1 Hello! Any updates here? I've rebased my branch on main :upside_down_face:
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
@alex123012 can you rebase this and see if the tests passes please? So I can approve and merge it :)
/retest
@rikatz Hi! All tests passed :tada: :)
@alex123012 thanks, one last request, can you add the new metrics to: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/monitoring.md on a follow up PR? I will merge this one, please just don't forget it :)
Thanks! /lgtm