cfrpki icon indicating copy to clipboard operation
cfrpki copied to clipboard

rrdp_errors metric is typed as gauge, appears to be a counter

Open ties opened this issue 4 years ago • 0 comments

After a single RRDP error the rrdp_errors metric increased to 1. However it did not recover on 0 on later, successful runs (as I would expect a gauge to do). This caused my alert to keep firing.

I see

# HELP rrdp_errors RRDP error count.
# TYPE rrdp_errors gauge
...
rrdp_errors{address="https://rrdp.ripe.net/notification.xml"} 1

While the log ends with

2021-07-16T09:40:37.483239000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP sync https://rrdp.ripe.net/notification.xml"
2021-07-16T09:40:37.483515000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP: Downloading root notification https://rrdp.ripe.net/notification.xml"
2021-07-16T09:40:37.511614000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP: https://rrdp.ripe.net/notification.xml has 0 deltas to parse (cur: 4190, last: 4190)"

Where I do not see an RRDP error.

To be fair, this is a nit. But if this is really a counter I would prefer that type - I would have written an increase(...) alert instead.

ties avatar Jul 16 '21 09:07 ties