Physical backup restore stuck on version 1.20.1
Report
Restoring from physical backup with point-in-time recovery results in a stuck restore. The cluster has sharding enabled on the collection.
➜ k describe PerconaServerMongoDBRestore
Name: restore1
Namespace: demo-mongodb
Labels: <none>
Annotations: <none>
API Version: psmdb.percona.com/v1
Kind: PerconaServerMongoDBRestore
Metadata:
Creation Timestamp: 2025-07-08T10:02:57Z
Generation: 1
Resource Version: 9536705
UID: 5dd0c8c8-f3b6-4481-8f68-54104afc552c
Spec:
Backup Name: backup1
Cluster Name: demo-psmdb-db
Pitr:
Type: latest
Status:
Pbm Name: 2025-07-08T10:10:12.768452917Z
Pitr Target: 2025-07-08T08:54:01
State: requested
Events: <none>
More about the problem
Operator does the restore procedure and gets stuck on this log:
2025-07-08T10:10:12.789Z INFO Restore state changed {"controller": "psmdbrestore-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBRestore", "PerconaServerMongoDBRestore": {"name":"restore1","namespace":"demo-mongodb"}, "namespace": "demo-mongodb", "name": "restore1", "reconcileID": "a38c71a6-7b43-4d90-aa94-6e31f2136a55", "previous": "waiting", "current": "requested"}
The DB is never restored and the cluster is in initializing state. Restarting operator deployment does not help, it doesn't try to continue the restore process.
Steps to reproduce
- Create DB
replsets:
rs0:
size: 3
serviceAccountName: psmdb-operator
resources:
limits:
cpu: 300m
memory: 1024Mi
requests:
cpu: 150m
memory: 512Mi
volumeSpec:
pvc:
storageClassName: gp3
resources:
requests:
storage: 4Gi
arbiter:
enabled: false
size: 1
rs1:
size: 3
serviceAccountName: psmdb-operator
resources:
limits:
cpu: 300m
memory: 1024Mi
requests:
cpu: 150m
memory: 512Mi
volumeSpec:
pvc:
storageClassName: gp3
resources:
requests:
storage: 4Gi
arbiter:
enabled: false
size: 1
sharding:
configrs:
size: 3
serviceAccountName: psmdb-operator
volumeSpec:
pvc:
storageClassName: gp3
resources:
requests:
storage: 4Gi
mongos:
size: 3
resources:
limits:
cpu: 1000m
memory: 1024M
requests:
cpu: 300m
memory: 500M
serviceAccountName: psmdb-operator
backup:
enabled: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/psmdb-operator
storages:
s3-eu-north-1:
main: true
type: s3
s3:
bucket: psmdb-operator
retryer:
numMaxRetries: 3
minRetryDelay: 30ms
maxRetryDelay: 5m
region: eu-north-1
pitr:
enabled: true
compressionType: gzip
compressionLevel: 6
tasks:
- name: daily-s3-eu-north-1-physical
enabled: true
schedule: "0 0 * * *"
keep: 30
type: physical
storageName: s3-eu-north-1
compressionType: gzip
compressionLevel: 6
- Login with databaseAdmin user using mongosh cli and create data
use demo
db.demo.insertOne({ msg: "This is the first document" })
- Login with clusterAdmin user using mongosh cli and enable sharding
use admin
sh.shardCollection("demo.demo", { _id: 1 })
- Create backup
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
finalizers:
- percona.com/delete-backup
name: backup1
namespace: demo-mongodb
spec:
clusterName: demo-psmdb-db
storageName: s3-eu-north-1
type: physical
- Restore from backup
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
name: restore1
spec:
clusterName: demo-psmdb-db
backupName: backup1
pitr:
type: latest
Versions
- Kubernetes EKS 1.31
- Operator 1.20.1
- Database 1.20.1
Anything else?
No response
I tried downgrading operator and db to 1.20.0 and restore worked fine on empty DB. When i tried restoring DB which has sharded collection it failed by just being stuck, similar to 1.20.1.
Downgrading to 1.19.1 was the only version which properly restored point-in-time recovery on a db which has sharded collection.
After trying spinning up empty db and restoring on 1.19.1 it has issue of unable to find backup in s3 even though crd backup object exists and data is in s3.
There was no such issue in version 1.20.1 but it gets stuck on replaying oplog forever. Sometimes the restore stucks on requested state.
lets us try to reproduce as well, @eleo007 can you pick it up?
I'll try to reproduce when I have the chance.