k8up icon indicating copy to clipboard operation
k8up copied to clipboard

PreBackupPod Metrics for Monitoring

Open tobru opened this issue 3 years ago • 0 comments

Summary

As user of K8up I want to be able to alert on failed PreBackupPods So that I can act on it and fix the issue.

Context

We need a Prometheus counter that will increase if PreBackupPods failed, similar to the ones already in place with successful/failed backups.

Currently it is possible that the PreBackupPods can fail silently and only be found via log messages generated by the operator.

The counter should increased in following cases:

  • the PreBackupPod can't be created (permission denied, etc.)
  • the PreBackupPod failed (e.g. exit != 0)
  • the PreBackupPod command failed (e.g. mysql broken pipe, etc.)

With that counter we can create alerting rules that will trigger, if it increased during a certain amount of time (like 24h).

Out of Scope

Further links

Acceptance criteria

Given a PreBackupPod When something fails with handling PreBackupPods Then a counter is increased which is exported as metric.

Implementation Ideas

tobru avatar Jan 22 '21 08:01 tobru