Feature request: Improve observability in api-server metrics implementation
This is a follow up on #498 , currently api-server /metrics implementation doesn't allow to properly monitor targets up/down.
api-server:
address: :7890
enable-metrics: true
Once the target becomes gnmi unreachable, the current metric gnmic_subscribe_number_of_received_subscribe_response_messages_total disappears.
There should be a deterministic way in /metrics to tell whether a target up or down.
Should be added in v0.39.0, please give it a try.
I'm not the initial requester but we've been waiting for it as well :) Tested and works well for us. Thank you for implementing it ! Quick dashboard.
oh, wow, Wikimedia uses gnmic then, @XioNoX ?
Yep, we're only at the beginning, but it has been working very well for us so far. Thanks for the tool! You can find more doc on https://wikitech.wikimedia.org/wiki/Network_telemetry
Nice, thank you!
This is a long awaited and a very useful addition!
We have recently upgraded to 0.40.0 and we might have encountered a bit of a misbehaviour. It does return correctly the metric "gnmic_target_up{name="OBFUSCATED"} 0" when targets are down but it appears the metric is persisting even when a device has been removed from the list of targets.
Below is an example of the output when accessing the /API and /metrics URLs:
$ export TARGET='OBFUSCATED'
$ curl -s http://localhost:7890/api/v1/targets | grep ${TARGET} | wc -l
0
$ curl -s http://localhost:7890/metrics | grep "gnmic_target_up{name=\"${TARGET}\"} 0" | wc -l
1