resticprofile icon indicating copy to clipboard operation
resticprofile copied to clipboard

Prometheus metric resticprofile_backup_status is 2 even when backups fail

Open deviantintegral opened this issue 1 year ago • 7 comments

To test alerting on the resticprofile_backup_status I tweaked my AWS access key to be invalid, and triggered a backup. While the job errored out, I see a fresh metric for resticprofile_backup_status with the status of 2.

Luckily, the Last Backup timestamp isn't changed, so I can probably alert on that. However, I expected the status to be 0.

deviantintegral avatar Feb 27 '24 20:02 deviantintegral

You're right, I wouldn't expect the status to be 2 🤔 Can you please post your profile configuration (with any repository information redacted) so I can get a better idea of what is happening?

creativeprojects avatar Mar 04 '24 20:03 creativeprojects

Sure, here it is. I have several other backup sets but they all have the same config.

version: "1"

global:
  scheduler: crond
  priority: low

base:
  initialize: true
  password-file: key
  prometheus-push: "http://metrics-docker.lan:9091/"
  prometheus-save-to-file: "{{ .Profile.Name }}.prom"
  prometheus-labels:
    - host: {{ .Hostname }}
  backup:
    exclude-caches: true
    one-file-system: true
    check-before: true
    extended-status: true
  retention:
    after-backup: true
    keep-daily: 30
    keep-weekly: 4
    keep-monthly: 13
    prune: true

photos:
  inherit: base
  lock: /tmp/photos.lock
  force-inactive-lock: true
  rustic-stale-lock-age: 5m
  repository: REDACTED-S3-ENDPOINT-ON-B2
  env:
    AWS_ACCESS_KEY_ID: REDACTED_ACCESS_KEY
    AWS_SECRET_ACCESS_KEY: REDACTED_SECRET_KEY
  backup:
    source:
      - '/source/photos'
    schedule: "04:00"
    schedule-permission: system

deviantintegral avatar Mar 06 '24 22:03 deviantintegral

Right, I see what's happening:

  • the check command fails immediately since the repository is not available
  • resticprofile stops after the check failed, without trying to run a backup

But only the backup command generates prometheus metrics. So at that point it's keeping the existing metrics and not generating new ones.

I think to fix this issue we would need to generate a status line for each part (check, forget, etc.)

creativeprojects avatar Mar 09 '24 18:03 creativeprojects