karma icon indicating copy to clipboard operation
karma copied to clipboard

No metrics for probeVersion failures (bug ?)

Open ngc104 opened this issue 3 years ago • 8 comments

I get sometimes 503 errors in the logs :

karma-5666566999-n28pv karma level=error msg="Request failed" error="request to https://***redacted***alertmanager:9093/metrics failed with 503 Service Unavailable" alertmanager=***redacted*** uri=https://***redacted***alertmanager:9093/

It seems that the error message comes from /internal/alertmanager/models.go#L92

And probeVersion is called at /internal/alertmanager/models.go#L366.

Question 1 : could you confirm this ?


In this code, I also notice that when an error occur, probeVersion will return "" with some logging, but :

  • fetching the status (line 379) will not be blocked. How can it work if you got a 503 error when trying to retrieve the Alertmanager version ?
  • there is no metric to show that probing the version failed.

Question 2 / Bug ? : when probing the version fails, but Karma goes on retrieving silences and alerts, is this a bug ?

Question 3 / Feature request : Could you create a metric that shows when probing Alertmanager version failed ? Shouldn't it stop at line 370 ?

For this feature request, maybe you could create a metric named karma_alertmanager_probed_version with the version as a label and with the value set to 1, or 0 if something failed ?

ngc104 avatar Mar 11 '22 10:03 ngc104

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 10 '22 19:05 github-actions[bot]

Hello,

Any ideas about my questions/feature request ?

ngc104 avatar May 11 '22 09:05 ngc104

It's not a bug, if karma cannot detect alertmanager version it assumes latest compatible version. What do you need a metric for? It sounds like your alertmanager is failing with 503 (or whatever it's behind).

prymitive avatar May 18 '22 11:05 prymitive

I have not had 503 errors for a while.

When I wrote this issue, maybe there was a problem on Alertmanager that I could not reproduce at that moment, and that blocked Karma for the version but not for the alerts&silences.

As you say, Karma assumes the latest compatible version when it cannot retrieve the Alertmanager version. This it why Karma was still working and I noticed nothing but the log with the 503 error.

No problem for a while : should we close this issue ?

About feature request and the metric, it could be a counter that increments every time Karma fails to connect to Alertmanager. This should be easier to have a native metric than creating a custom metric with Promtail matching on 503. But I have had no problem for a while : do I still need it ? I don't know...

ngc104 avatar May 18 '22 12:05 ngc104

There's karma_alertmanager_errors_total & karma_alertmanager_up metric already exported

prymitive avatar May 18 '22 12:05 prymitive

Thanks, I'll give a try on these metrics.

ngc104 avatar May 18 '22 12:05 ngc104

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 17 '22 19:07 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 16 '22 19:09 github-actions[bot]