prometheus-adapter icon indicating copy to clipboard operation
prometheus-adapter copied to clipboard

Monitoring of prometheus-adapter metrics

Open matthewjstanford opened this issue 1 year ago • 3 comments

What happened?:

I've got a set of custom metrics defined in prometheus-adapter. I was refactoring the source metrics in Prometheus (modifying labels) and inadvertently broke one of the custom metric in prometheus-adapter.

This specific custom metric was used by an HPA, along with CPU & Memory. When the custom metric stopped responding (returning a 404) the HPA went into the weeds and scaled the deployment way up. I believe this is mostly a bug in how the HPA handles missing metrics, but this really begs the question, how can I monitor the health of custom metrics provided by prometheus-adapter?

What did you expect to happen?:

I expected the prometheus-adapter to emit prometheus metrics itself. Something along these lines:

example metrics
# TYPE prometheus_adapter_custom_request_status_total gauge
prometheus_adapter_custom_request_status_total{metric="my_custom_metric", status="200"} 1
prometheus_adapter_custom_request_status_total{metric="my_inalid_custom_metric", status="404"} 2

# TYPE prometheus_adapter_external_request_status_total gauge
prometheus_adapter_external_request_status_total{metric="my_external_metric", status="200"} 5
prometheus_adapter_external_request_status_total{metric="my_invalid_external_metric", status="404"} 6

But I don't believe prometheus-adapter emits any metrics (hopefully I'm wrong!).

Having info like this would enable the ability to actively monitor the availability of critical custom metrics, such as the ones discussed above.

matthewjstanford avatar Jan 18 '24 16:01 matthewjstanford

It looks like I can monitor the availability of the prometheus-adapter metrics via a Horizontal Pod Autoscaler metric, kube_horizontalpodautoscaler_status_target_metric.

This is a bit backwards, IMO, but it at least provides a mechanism to monitor the metrics.

matthewjstanford avatar Jan 23 '24 15:01 matthewjstanford

/triage accepted /assign

dgrisonnet avatar Jan 25 '24 17:01 dgrisonnet

Same for us - we've broken an external metric and got to know about it after several days. It would be great to somehow monitor prometheus-adapter itself.

pznamensky avatar Jun 11 '24 13:06 pznamensky