prometheus-pve-exporter icon indicating copy to clipboard operation
prometheus-pve-exporter copied to clipboard

metrics suggestion: backup jobs, replication jobs

Open steveej opened this issue 3 years ago • 7 comments

hey @znerol, thank you for creating this helpful exporter :raised_hands:

i'd like to track and set up alerts for failed or absent backups, replications, and on high IO delay (the one that's displayed in the webui for each node).

cheers :wave:

steveej avatar May 03 '22 12:05 steveej

This exporter is using the PVE REST API. Looking through the API docs I have found the following interesting routes possibly covering your requirements (at least partly):

absent backups: cluster/backup-info/not-backet-up lists all guests (qemu and lxc) which are not covered by any backup plan. failed backups: Maybe this is extractable from /cluster/backup. failed replications: Maybe this is extractable from /cluster/replication

Regarding high IO delay I recommend to take a look at node_exporter. For node level metrics, this is usually the better option.

znerol avatar May 03 '22 19:05 znerol

thanks @znerol

cluster/backup-info/not-backet-up lists all guests (qemu and lxc) which are not covered by any backup plan.

while i originally meant backup jobs who for some reason didn't execute, i also like the idea of alerting when a VM doesn't have a backup job at all.

for the rest i'll also have a look at the API to see which items would be useful to add.

Regarding high IO delay I recommend to take a look at node_exporter. For node level metrics, this is usually the better option.

indeed, thanks! i thought PVE was doing something special but according to the frontend code it evaluates the system's wait load, which can be gathered otherwise.

steveej avatar May 04 '22 11:05 steveej

Hello everyone, is there any progress? I faced a similar problem. I need to know which machines were left without backup, or there was an error.

xziy avatar Oct 01 '23 17:10 xziy

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

StarkZarn avatar Feb 19 '24 19:02 StarkZarn

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

Please use node_exporter for the iowait metric. Take a look at this blog post for a start.

znerol avatar Feb 20 '24 07:02 znerol

IO wait would be a very useful metric to have, IMO, if possible -- especially for those using ZFS for backing storage.

Please use node_exporter for the iowait metric. Take a look at this blog post for a start.

Thank you!

StarkZarn avatar Feb 21 '24 01:02 StarkZarn

Thenks to @svengerber and @themoriarti, replication metrics are available as of release v3.3.0.

znerol avatar Apr 27 '24 10:04 znerol