trident icon indicating copy to clipboard operation
trident copied to clipboard

Prometheus metrics for error counts

Open mac-chaffee opened this issue 4 years ago • 0 comments

Describe the solution you'd like It looks like the current metrics are missing a way to track error counts: https://github.com/NetApp/trident/blob/master/core/metrics.go

Counting errors seems relatively common, like in external-dns, or kube-apiserver, which can then be used for alerting.

For example, I had a situation where I didn't properly prepare Trident for a volume move, which meant Trident was failing to recognize the new aggregate, leaving all new volumes stuck in "Pending". I would like to set up alerting for that kind of situation.

Describe alternatives you've considered Could approximate the error rate from logs, but that's not a stable interface.

Additional context N/A

mac-chaffee avatar Nov 12 '21 15:11 mac-chaffee