resticprofile
resticprofile copied to clipboard
Prometheus metric resticprofile_backup_status is 2 even when backups fail
To test alerting on the resticprofile_backup_status I tweaked my AWS access key to be invalid, and triggered a backup. While the job errored out, I see a fresh metric for resticprofile_backup_status with the status of 2.
Luckily, the Last Backup timestamp isn't changed, so I can probably alert on that. However, I expected the status to be 0.
You're right, I wouldn't expect the status to be 2 🤔
Can you please post your profile configuration (with any repository information redacted) so I can get a better idea of what is happening?
Sure, here it is. I have several other backup sets but they all have the same config.
version: "1"
global:
scheduler: crond
priority: low
base:
initialize: true
password-file: key
prometheus-push: "http://metrics-docker.lan:9091/"
prometheus-save-to-file: "{{ .Profile.Name }}.prom"
prometheus-labels:
- host: {{ .Hostname }}
backup:
exclude-caches: true
one-file-system: true
check-before: true
extended-status: true
retention:
after-backup: true
keep-daily: 30
keep-weekly: 4
keep-monthly: 13
prune: true
photos:
inherit: base
lock: /tmp/photos.lock
force-inactive-lock: true
rustic-stale-lock-age: 5m
repository: REDACTED-S3-ENDPOINT-ON-B2
env:
AWS_ACCESS_KEY_ID: REDACTED_ACCESS_KEY
AWS_SECRET_ACCESS_KEY: REDACTED_SECRET_KEY
backup:
source:
- '/source/photos'
schedule: "04:00"
schedule-permission: system
Right, I see what's happening:
- the
checkcommand fails immediately since the repository is not available - resticprofile stops after the
checkfailed, without trying to run abackup
But only the backup command generates prometheus metrics. So at that point it's keeping the existing metrics and not generating new ones.
I think to fix this issue we would need to generate a status line for each part (check, forget, etc.)