consul_exporter icon indicating copy to clipboard operation
consul_exporter copied to clipboard

Consul maintenance mode for service

Open pvyaka01 opened this issue 4 years ago • 4 comments

I marked a couple of services in consul to be in maintenance mode. However, for those services status="maintenance" shows 0. But status="critical" shows 1. Consul UI shows service is in maintenance mode. I was expecting status="maintenance" to be 1 for those services. Am i not understanding this right?

Thanks!

pvyaka01 avatar Dec 20 '19 17:12 pvyaka01

It seems like the information isn't exposed indeed. The closest metric would probably be consul_catalog_service_node_healthy (only exposed with the --consul.health-summary flag since it needs an additional call to the Consul API per service). But its value is either 1 (all checks are passing) or 0 (for anything other than passing including maintenance).

simonpasquier avatar Jan 02 '20 14:01 simonpasquier

Hello, I'm using consul_exporter v0.6.0 consul v.1.4.5

And I tried with the option --consul.health-summary but for a maintenance enabled for a passing service, i have this result.

consul_catalog_service_node_healthy{node="DUMMY-NODE",service_id="dummy",service_name="dummy"} 1
consul_health_node_status{check="serfHealth",node="DUMMY-NODE",status="critical"} 0
consul_health_node_status{check="serfHealth",node="DUMMY-NODE",status="maintenance"} 0
consul_health_node_status{check="serfHealth",node="DUMMY-NODE",status="passing"} 1
consul_health_node_status{check="serfHealth",node="DUMMY-NODE",status="warning"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="critical"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="maintenance"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="passing"} 1
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="warning"} 0

and for a critical service :

consul_health_service_status{check="_service_maintenance:dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="critical"} 1
consul_health_service_status{check="_service_maintenance:dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="maintenance"} 0
consul_health_service_status{check="_service_maintenance:dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="passing"} 0
consul_health_service_status{check="_service_maintenance:dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="warning"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="critical"} 1
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="maintenance"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="passing"} 0
consul_health_service_status{check="chk_dummy",node="DUMMY-NODE",service_id="dummy",service_name="dummy",status="warning"} 0

My goal is to diagnose critical service nodes but not when they are in maintenance.

Is it the expected result ? Thank you.

gmaurice avatar Jan 09 '20 10:01 gmaurice

I just checked if it's dependant on the version, and no. I checked with consul v1.6.2and I have the same behavior.

gmaurice avatar Jan 15 '20 17:01 gmaurice

Right now there's no way that the exporter can report a service or node in maintenance because a health check from the API can only be passing, warning or critical. maintenance is a value that comes directly from the API client and is deduced from all the checks associated to a service or node but the exporter doesn't use it (yet).

https://github.com/hashicorp/consul/blob/ed6102326d1c8c8477efb8fb7490b0bf4572f0da/api/health.go#L169-L202

simonpasquier avatar Jan 23 '20 16:01 simonpasquier