cfrpki
cfrpki copied to clipboard
rrdp_errors metric is typed as gauge, appears to be a counter
After a single RRDP error the rrdp_errors metric increased to 1. However it did not recover on 0 on later, successful runs (as I would expect a gauge to do). This caused my alert to keep firing.
I see
# HELP rrdp_errors RRDP error count.
# TYPE rrdp_errors gauge
...
rrdp_errors{address="https://rrdp.ripe.net/notification.xml"} 1
While the log ends with
2021-07-16T09:40:37.483239000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP sync https://rrdp.ripe.net/notification.xml"
2021-07-16T09:40:37.483515000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP: Downloading root notification https://rrdp.ripe.net/notification.xml"
2021-07-16T09:40:37.511614000Z time="2021-07-16T09:40:37Z" level=info msg="RRDP: https://rrdp.ripe.net/notification.xml has 0 deltas to parse (cur: 4190, last: 4190)"
Where I do not see an RRDP error.
To be fair, this is a nit. But if this is really a counter I would prefer that type - I would have written an increase(...) alert instead.