junos_exporter icon indicating copy to clipboard operation
junos_exporter copied to clipboard

Locking / Caching

Open idl0r opened this issue 4 years ago • 3 comments

Hi,

a curl against /metrics for example, triggers scrapping every time. It should be execuded only once at a time and the old data should be cached and served until new data is available.

idl0r avatar Jul 20 '20 11:07 idl0r

This would contradict https://prometheus.io/docs/instrumenting/writing_exporters/#scheduling

"Metrics should only be pulled from the application when Prometheus scrapes them, exporters should not perform scrapes based on their own timers. That is, all scrapes should be synchronous."

But it would be useful to do async scrapes. If running n replicas of Prometheus we have n scrapes which can come at the same time resulting in high load on the equipment and possibly failed scrapes due to scrape_timeout in Prometheus

AKYD avatar Nov 09 '20 14:11 AKYD

I get the sync only pulls, but sometimes that is not possible as scrape takes too long even for a single pull. Network devices are often the case as high rate pulls are not something than engineers had in mind designing them.

We are using node_exporter on some of the whitebox switches and it works great, but node_exporter is really optimized to use /proc and /sys filesystems to have quick and low overhead access to metrics, which can't be said for exporters like this. And as you said, sometimes we run multiple prometheus instances (for HA or prod/testing instances) and that can kill the device if we are not careful.

I don't know, IMHO it's better to have cached metrics and stable device than unstable device and missing metrics on some pulls due to overload. If pull interval is configurable, one can set it to the same as scrape interval of most "detailed" prom server and have relatively good data. Not ideal, but better then the alternative.

In the future, vendors might support Prom/OpenMetrics native format and this will be a non-issue.

matejzero avatar Apr 25 '21 15:04 matejzero

I totally agree with what you said.

I have the same issue with snmp_exporter (multiple scrapes coming at the same time) and since it's a common scenario, I implemented a "cacher" in front of the exporter that essentially returns the same set of metrics to any number of Prometheus servers without doing any new scrape on the device.

Unfortunately my Python-fu is worse that my go-fu so after a few days I get a lot of half open connections piling up. I plan on revisiting the idea at some point.

AKYD avatar Jul 15 '21 12:07 AKYD