consul_exporter icon indicating copy to clipboard operation
consul_exporter copied to clipboard

Output field from Consul

Open pvyaka01 opened this issue 4 years ago • 5 comments

Consul exposes a label called Output in it's API which can be especially useful for health checks when a script outputs a value - for example: "All processes are up" in case of status="passing" or "Process <process_name> is dead" in case of status="critical".

Here's an example: ,"CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":"","ServiceTags":[],"Definition" We can see this with curl http://localhost:8500/v1/health/state/any

Can that be exposed through consul_exporter? It is helpful when we send alerts for failing checks. Thanks!

pvyaka01 avatar Jul 15 '19 20:07 pvyaka01

In general we avoid labels with unbounded values because it could increase labels cardinality dramatically and also because instrumentation practices recommend that all label values are exposed (series that come and go are difficult to deal with).

simonpasquier avatar Jul 19 '19 13:07 simonpasquier

Ok, understood. Any ideas how i can scrape this field? Thanks for the help!

pvyaka01 avatar Jul 21 '19 00:07 pvyaka01

I don't think exposing service stats would be in our usual category of unbounded metrics. This seems on the surface like it would be similar to kube state metrics, or systemd service state metrics.

SuperQ avatar Jul 21 '19 13:07 SuperQ

IIUC the Output field could vary a lot with check scripts. Eg if the check is running ping -c1 foo.example.com, the output will be different (almost) every time. Of course you can still write "good" checks that generate predictable outputs but the exporter can't know for sure.

simonpasquier avatar Jul 22 '19 12:07 simonpasquier

Still - this is a very much needed feature... While the string description is rarely so useful, it is a lot different when the check outputs numeric value that it measures. Having the output numeric value exposed to Prometheus would benefit us greatly with ability to monitor trends and predict failures before the error actually happens.

It seems that all that needs to be done to make it sensible is to add some limits on what outputs can be exposed and discard the rest.

I'm pretty sure that if we allowed numeric values (and possibly strings up to 128 characters) that would cover vast majority of needs while not risking pushing junk into Prometheus (by virtue of ignoring non-compliant checks) ..

tgolebiowski-tbscg avatar Oct 04 '21 18:10 tgolebiowski-tbscg