aws-load-balancer-controller Prometheus support

It would be great to have metrics per alb group where we see how many targets are registered and if there is any rule failure...

Right now an iam error like this can cause the whole system to fail:

hop-zed"],"leavingMembers":[]}
{"level":"error","ts":1585751311.2310958,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"alb-ingress-controller","request":"/spryker","error":"AccessDeniedException:

with an corresponding metric we could monitor such problems... An external monitor (url check) would not help because some old targets might be still up and running.

Apr 01 '20 14:04 runningman84

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Jul 09 '20 16:07 fejta-bot

/remove-lifecycle stale.

Jul 09 '20 17:07 runningman84

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

Aug 08 '20 18:08 fejta-bot

@runningman84 Just to make sure I'm fully understanding the situation, there is 0 support for prometheus today?

As someone considering using ALB Ingress controller, am I correct to understand that there is no supported way to "monitor" or alert on controller reconciliation failures?

I could potentially alert based on logs in Splunk or something, but that's not very elegant.

Aug 11 '20 13:08 clayvan

That's unfortunately correct, right now there is 0 support for prometheus...

You could run a custom container whiches parses the logs and publishes them as metrics....

Aug 11 '20 14:08 runningman84

/remove-lifecycle rotten

Aug 11 '20 14:08 runningman84

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

Nov 09 '20 15:11 fejta-bot

/kind feature

Nov 18 '20 22:11 kishorj

/remove-lifecycle stale

Dec 09 '20 02:12 techdragon

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Mar 09 '21 03:03 fejta-bot

It looks like there are Prometheus metrics, but they do not expose any data on a per-ingress or per-service level. Furthermore, there is not any data in the ingress itself that indicates that there may be an issue.

In the previous alb-ingress-controller, there was a metric called aws_alb_ingress_controller_errors{ingress="<namespace>/<name>"}, and we could use this to help notify teams that their ingress was misconfigured.

With the current metrics, we can only alert when something is failing to reconcile, and we are required to parse logs to understand the specific issue.

This is a pretty major usability issue.

Jul 15 '21 21:07 jutley

/assign m00nf1sh Check if the new controller runtime provide some useful metrics.

Jul 21 '21 22:07 kishorj

Add new metrics for ingress group usage like number of ingress groups and provisioned ALB in the cluster and count of errors encountered per group.

Jul 28 '21 22:07 kishorj

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 26 '21 22:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 25 '21 23:11 k8s-triage-robot

/remove-lifecycle rotten

Nov 26 '21 22:11 runningman84

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 24 '22 23:02 k8s-triage-robot

/remove-lifecycle stale

Mar 10 '22 21:03 pie-r

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 08 '22 21:06 k8s-triage-robot

/remove-lifecycle stale

Jun 09 '22 05:06 runningman84

i can see prometheus metrics like:

aws_api_call_retries_bucket{operation="CreateTargetGroup",service="Elastic Load Balancing v2",le="0"} 11

but not at targetgroup level

Jul 08 '22 05:07 tooptoop4

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 06 '22 06:10 k8s-triage-robot

/remove-lifecycle stale

Oct 06 '22 08:10 tooptoop4

+1

Nov 15 '22 14:11 RamazanKara

+1

Feb 06 '23 11:02 sanjeevpandey19

It looks like there are Prometheus metrics, but they do not expose any data on a per-ingress or per-service level. Furthermore, there is not any data in the ingress itself that indicates that there may be an issue.

In the previous alb-ingress-controller, there was a metric called aws_alb_ingress_controller_errors{ingress="<namespace>/<name>"}, and we could use this to help notify teams that their ingress was misconfigured.

With the current metrics, we can only alert when something is failing to reconcile, and we are required to parse logs to understand the specific issue.

This is a pretty major usability issue.

@kishorj @M00nF1sh any plans on fixing this regression? The current metric controller_runtime_reconcile_errors_total does not provide the same information that the previous aws_alb_ingress_controller_errors metric did.

May 06 '23 10:05 dudicoco

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 19 '24 19:01 k8s-triage-robot

/remove-lifecycle stale

Jan 19 '24 21:01 runningman84

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 18 '24 21:04 k8s-triage-robot

/remove-lifecycle stale

Apr 19 '24 04:04 runningman84

aws-load-balancer-controller aws-load-balancer-controller copied to clipboard

Prometheus support

aws-load-balancer-controller
aws-load-balancer-controller copied to clipboard