trident
trident copied to clipboard
Prometheus metrics for error counts
Describe the solution you'd like It looks like the current metrics are missing a way to track error counts: https://github.com/NetApp/trident/blob/master/core/metrics.go
Counting errors seems relatively common, like in external-dns, or kube-apiserver, which can then be used for alerting.
For example, I had a situation where I didn't properly prepare Trident for a volume move, which meant Trident was failing to recognize the new aggregate, leaving all new volumes stuck in "Pending". I would like to set up alerting for that kind of situation.
Describe alternatives you've considered Could approximate the error rate from logs, but that's not a stable interface.
Additional context N/A