ingress-nginx Add new prometheus metric for orphaned ingress

What this PR does / why we need it:

Logging for ingresses with no endpoints or services available is already present:

The service does not exist at all:

nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.862371 9 controller.go:753] Error obtaining Endpoints for Service "namespace1/echoserver": no object matching key "namespace1/echoserver" in local store
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.864926 9 controller.go:753] Error obtaining Endpoints for Service "namespace2/service2-service": no object matching key "namespace2/service2-service" in local store
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 10:10:17.865429 9 controller.go:753] Error obtaining Endpoints for Service "namespace2/service3-service": no object matching key "namespace2/service3-service" in local store

Ingresses which service does exist, but that does not have any active endpoints:

nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 09:58:53.444455       9 controller.go:826] Service "namespace3/service4-service" does not have any active Endpoint.
nginx-ingress-controller-7f69d46b7d-5p47g nginx-ingress-controller W1120 09:58:53.444524       9 controller.go:826] Service "namespace4/service5-service" does not have any active Endpoint.

While following the cited logs is a possibility, they are rather "noisy" and only the "error case" causes a line. So one does not know when things since then have recovered (the ingress was reconfigured / deleted or endpoints are back up). This metric will help to control the orphaned ingresses

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation only

Which issue/s this PR fixes

fixes #4763

How Has This Been Tested?

Tested with unit tests and manual tests with make dev-env cluster

Checklist:

[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[x] I've read the CONTRIBUTION guide
[ ] I have added tests to cover my changes.
[ ] All new and existing tests passed.

Feb 08 '22 23:02 alex123012

The committers are authorized under a signed CLA.

:white_check_mark: Makhonin Alexey (3ec448ec15b4299a9cdc06ef22233d7e8be8e42e)

Feb 08 '22 23:02 linux-foundation-easycla[bot]

Welcome @alex123012!

It looks like this is your first PR to kubernetes/ingress-nginx 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/ingress-nginx has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

Feb 08 '22 23:02 k8s-ci-robot

Hi @alex123012. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 08 '22 23:02 k8s-ci-robot

/easycla

Feb 08 '22 23:02 alex123012

/assign @rikatz

Feb 08 '22 23:02 alex123012

HI! This is my vision on this feature and and if such implementation, in your opinion, is incorrect, I will be happy to redo it :)

Feb 08 '22 23:02 alex123012

/ok-to-test thanks for the PR, let me take a look :)

Feb 13 '22 18:02 rikatz

/triage accepted /kind feature /priority important-longterm

Feb 14 '22 15:02 strongjz

Sorry, I can't understand, why test are not passing (can't figure out where to change TestGetBackendServers in controller_test.go). Can someone help me?

--- FAIL: TestGetBackendServers (1.10s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x107fdd4]

goroutine 103 [running]:
testing.tRunner.func1.2({0x1254bc0, 0x234b430})
	/usr/local/go/src/testing/testing.go:1209 +0x258
testing.tRunner.func1(0x40002c41a0)
	/usr/local/go/src/testing/testing.go:1212 +0x278
panic({0x1254bc0, 0x234b430})
	/usr/local/go/src/runtime/panic.go:1038 +0x224
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).createUpstreams(0x40004761e0, {0x4000286228, 0x1, 0x1}, 0x40002f6a80)
	/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:1000 +0x1b74
k8s.io/ingress-nginx/internal/ingress/controller.(*NGINXController).getBackendServers(0x40004761e0, {0x4000286228, 0x1, 0x1})
	/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller.go:590 +0x6c
k8s.io/ingress-nginx/internal/ingress/controller.TestGetBackendServers(0x40002c41a0)
	/go/src/k8s.io/ingress-nginx/internal/ingress/controller/controller_test.go:2347 +0x290c
testing.tRunner(0x40002c41a0, 0x151c8a0)
	/usr/local/go/src/testing/testing.go:1259 +0xf8
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1306 +0x350
FAIL	k8s.io/ingress-nginx/internal/ingress/controller	1.485s```

Feb 16 '22 12:02 alex123012

my bet is metricCollector is nil, but need to check why.

Feb 16 '22 15:02 rikatz

@alex123012 could you figure out the reason of the panic?

I'm sort of reviewing some PRs on my queue here, wanted to look into this but I'm without my home lab :/

Will try to cycle back during the week,

Feb 20 '22 21:02 rikatz

Okey, I'll try to do my best :)

Feb 20 '22 21:02 alex123012

Hi! I found why metric collector is nil. this is because in test TestGetBackendServers it calls newDynamicNginxController function, which returns NGINXController without metric collector defined. Then in test it calls getBackendServers, which in turn calls function createUpstreams where orphanity metric is collected. I've added dummy collector to newDynamicNginxController function and now it works fine. Is this a good solution?

I don't think, that new metric collecting now is the cause of failing CI tests

Mar 06 '22 23:03 alex123012

@rikatz @iamNoah1 Could you take a look please? ^

Apr 07 '22 15:04 alex123012

hey, I will take a look into this next weekend

Apr 12 '22 21:04 rikatz

Adding this to my review spreadsheet :)

May 08 '22 00:05 rikatz

/retest /approve /hold We should merge this before freeze, or fix it.

Jun 26 '22 21:06 rikatz

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alex123012, rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rikatz]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jun 26 '22 21:06 k8s-ci-robot

@alex123012 I'm having some bad time to re-run here. Can you rebase based on our latest tests, we fixed a bunch of stuff recently

Jun 26 '22 21:06 rikatz

@rikatz I've rebased my branch on main, but tests continue to fail. IMHO errors aren't connected with these feature

Jun 26 '22 22:06 alex123012

/retest

Jul 01 '22 13:07 alex123012

/remove hold

Jul 04 '22 11:07 iamNoah1

/remove-hold

Jul 04 '22 11:07 iamNoah1

/ok-to-test

Jul 04 '22 11:07 iamNoah1

@rikatz @iamNoah1 Hello! Any updates here? I've rebased my branch on main :upside_down_face:

Aug 10 '22 18:08 alex123012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 09 '22 09:11 k8s-triage-robot

/remove-lifecycle stale

Nov 09 '22 12:11 frittentheke

@alex123012 can you rebase this and see if the tests passes please? So I can approve and merge it :)

Jan 08 '23 22:01 rikatz

/retest

Jan 09 '23 05:01 alex123012

@rikatz Hi! All tests passed :tada: :)

Jan 09 '23 06:01 alex123012

@alex123012 thanks, one last request, can you add the new metrics to: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/monitoring.md on a follow up PR? I will merge this one, please just don't forget it :)

Thanks! /lgtm

Jan 16 '23 12:01 rikatz

ingress-nginx ingress-nginx copied to clipboard

Add new prometheus metric for orphaned ingress

What this PR does / why we need it:

Types of changes

Which issue/s this PR fixes

How Has This Been Tested?

Checklist:

ingress-nginx
ingress-nginx copied to clipboard