postgres-operator
postgres-operator copied to clipboard
Restoring from Non AWS S3 bucket
Hi,
I have created a single instance database which works flawlessly, even the wal backup to backblaze S3 works.
Unfortunately I have not been able to restore (clone) the database from that bucket yet. When the new cluster is started, spilo tries to fetch a backup from the bucket, but then nothing happens.
This is my podconfig:
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config
data:
BACKUP_SCHEDULE: "*/30 * * * *"
BACKUP_NUM_TO_RETAIN: "7"
AWS_ENDPOINT: "https://s3.eu-central-003.backblazeb2.com"
AWS_DEFAULT_REGION: "eu-central-003"
AWS_S3_FORCE_PATH_STYLE: "true"
USE_WALG_BACKUP: "true"
USE_WALG_RESTORE: "true"
#WAL_S3_BUCKET: "backup-staging-postgres"
SCOPE: db-postgres
WALG_S3_PREFIX: s3://backup-staging-postgres/db-postgres
CLONE_AWS_ENDPOINT: "https://s3.eu-central-003.backblazeb2.com"
CLONE_AWS_DEFAULT_REGION: "eu-central-003"
CLONE_AWS_S3_FORCE_PATH_STYLE: "true"
CLONE_USE_WALG_BACKUP: "true"
CLONE_USE_WALG_RESTORE: "false"
#CLONE_WAL_S3_BUCKET: "backup-staging-postgres"
CLONE_SCOPE: db-postgres
CLONE_WALG_S3_PREFIX: s3://backup-staging-postgres/db-postgres
This is the postgresql manifest:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: db-postgres
spec:
teamId: "db"
volume:
size: 1Gi
storageClass: bronze
numberOfInstances: 1
users:
postgres:
- superuser
- createdb
user123:
- superuser
- createdb
databases:
first: user123
second: user123
postgresql:
version: "13"
resources:
requests:
cpu: 500m
memory: 512M
limits:
cpu: "2"
memory: 1Gi
clone:
uid: "7f42f9d2-608e-4660-9715-f8414181ae18"
cluster: "db-postgres"
timestamp: "2021-04-12T18:00:00+01:00"
I have double checked the uid of the "old" cluster is correct. I also made sure that there wasn't any pod of the old cluster running and the postgres CR has been deleted.
Pod logs:
[...]
2021-04-12 16:26:01,690 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_UPLOAD_CONCURRENCY
2021-04-12 16:26:01,690 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/USE_WALG_BACKUP
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/USE_WALG_RESTORE
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALE_LOG_DESTINATION
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/BACKUP_NUM_TO_RETAIN
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/TMPDIR
2021-04-12 16:26:02,937 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2021-04-12 16:26:02,952 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-04-12 16:26:02,954 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:03,093 INFO: trying to bootstrap a new cluster
2021-04-12 16:26:03,094 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-db-postgres" python3 /scripts/clone_with_wale.py --recovery-target-time="2021-04-12T18:00:00+01:00"
2021-04-12 16:26:03,136 INFO: Trying s3://backup-staging-postgres/db-postgres for clone
2021-04-12 16:26:13,455 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:13,455 INFO: not healthy enough for leader race
2021-04-12 16:26:13,505 INFO: bootstrap in progress
2021-04-12 16:26:23,454 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:23,455 INFO: not healthy enough for leader race
2021-04-12 16:26:23,455 INFO: bootstrap in progress
[...]
Can anyone please point me to what my configuration is missing?
After jumping into the postgres pod and running the restore script, which has already been started as seen in the pod logs (python3 /scripts/clone_with_wale.py --recovery-target-time="2021-04-12T18:00:00+01:00"), the databases have finally been recreated. I didn't need to change my configuration though.
So, why do I have to run the restore script manually?
Are you trying to create a clone that is called like the source cluster? Choose a different name.
Are you trying to create a clone that is called like the source cluster? Choose a different name.
Do I really have to use a different name, even if the old postgres cluster is neither running nor defined? And why is the cluster restore working when I run the restore process, which has been started at the container start, manually? The first process somehow is blocked.
I am also facing the same issue . Could not able to restore from existing running cluster. I follow the same steps given above. @timbrd
- Did you apply the clone config map part in the begining while creating a cluster at first time , bcoz you dont have the cluster ID first time?
- If you apply cluster manifest with clone config at second time then will your DB recovered , bcoz to me it get crashed?
- Even manually i tried with python clone_with_wale.py script" wal-e fetch" fail bcoz your data directory cant be same, and if you try to fetch in tmp location and later manully copy the files in DATA dir then it recover full not upto the recovery_target_time
It is an old issue but it looks as if you are setting CLONE_USE_WALG_RESTORE: "false" Set it to true and the clone will restore from S3.
CLONE_USE_WALG_RESTORE: "true"