backup-restore-operator
backup-restore-operator copied to clipboard
BUG: Rancher pod killed during backup restore when pruning is enabled
Issue
While restoring a backup in Rancher with the exact same configuration as in the previous one, the rancher pod gets killed and Rancher UI dissappears as can be seen in the video
https://github.com/user-attachments/assets/3a6530e0-172d-48b2-963d-62dcde8352a1
Step to reproduce (long way)
- Deploy Rancher 2.8.5
- Deploy a git repo
- Create a Backup using the Backup app on Rancher (preferably on S3)
- Create a new Rancher 2.8.5
- Deploy default Backup app
- Restore previously backed image WITH prune enabled (marked and recommended by default)
Short version to reproduce
- Deploy Rancher 2.8.5
- Create the following secret
kubectl create secret generic aws-secret \
--from-literal=accessKey=xxx \
--from-literal=secretKey=yyy
- Deploy default Backup app with this info:
bucketName: epinio-ci
credentialSecretName: aws-secret
credentialSecretNamespace: default
enabled: true
folder: mmt-rancher-backup
endpoint: s3.eu-central-1.amazonaws.com
endpointCA:
insecureTLSSkipVerify: false
region: eu-central-1
- Go to Restore Backup and ensure "Prune" checkbox is marked
- Using the previously pointed information target this bucket: 285-bu-07241230-da49091a-ddeb-4196-881d-d032bea9ea6e-2024-07-24T10-36-02Z.tar.gz. It basically contains this gitrepo on local cluster:
URL: https://github.com/rancher/fleet-examples
Branch: master
Path: simple
Observed Behavior
Rancher pod gets deleted and although tries to recreate is never able to do so
https://github.com/user-attachments/assets/bb3f3900-3245-4851-acf5-0eece0402760
Expected behavior
Pod should be able to recover well. Gitjob is correctly deployed and active
Additional info
- This affects Rancher
2.7-head
,2.8-head
and2.9-head
, however, it seemed not to affect Rancher2.7.6
(at least when tried for anoher issue) - If the recovery is done within the same cluster that the back was taken, the UI survives
- If the restore is done WITHOUT pruning, the UI survives, although ocassionally the gitjob is in a waiting status
Testing environment
- Single cluster k3s with k8s version
v1.27.10+k3s1