gnmic icon indicating copy to clipboard operation
gnmic copied to clipboard

Feature request: Improve observability in api-server metrics implementation

Open aned opened this issue 1 year ago • 6 comments

This is a follow up on #498 , currently api-server /metrics implementation doesn't allow to properly monitor targets up/down.

api-server:
  address: :7890
  enable-metrics: true

Once the target becomes gnmi unreachable, the current metric gnmic_subscribe_number_of_received_subscribe_response_messages_total disappears.

There should be a deterministic way in /metrics to tell whether a target up or down.

aned avatar Aug 21 '24 00:08 aned

Should be added in v0.39.0, please give it a try.

karimra avatar Nov 07 '24 19:11 karimra

I'm not the initial requester but we've been waiting for it as well :) Tested and works well for us. Thank you for implementing it ! Quick dashboard.

XioNoX avatar Nov 08 '24 09:11 XioNoX

oh, wow, Wikimedia uses gnmic then, @XioNoX ?

hellt avatar Nov 08 '24 09:11 hellt

Yep, we're only at the beginning, but it has been working very well for us so far. Thanks for the tool! You can find more doc on https://wikitech.wikimedia.org/wiki/Network_telemetry

XioNoX avatar Nov 08 '24 09:11 XioNoX

Nice, thank you!

aned avatar Nov 08 '24 22:11 aned

This is a long awaited and a very useful addition!

We have recently upgraded to 0.40.0 and we might have encountered a bit of a misbehaviour. It does return correctly the metric "gnmic_target_up{name="OBFUSCATED"} 0" when targets are down but it appears the metric is persisting even when a device has been removed from the list of targets.

Below is an example of the output when accessing the /API and /metrics URLs:

$ export TARGET='OBFUSCATED'
$ curl -s http://localhost:7890/api/v1/targets | grep ${TARGET} | wc -l
0
$ curl -s http://localhost:7890/metrics | grep "gnmic_target_up{name=\"${TARGET}\"} 0" | wc -l
1

ivayloskostadinov avatar Feb 10 '25 15:02 ivayloskostadinov