postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Restoring from Non AWS S3 bucket

Open timbrd opened this issue 4 years ago • 5 comments
trafficstars

Hi,

I have created a single instance database which works flawlessly, even the wal backup to backblaze S3 works.

Unfortunately I have not been able to restore (clone) the database from that bucket yet. When the new cluster is started, spilo tries to fetch a backup from the bucket, but then nothing happens.

This is my podconfig:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-pod-config
data:
  BACKUP_SCHEDULE: "*/30 * * * *"
  BACKUP_NUM_TO_RETAIN: "7"

  AWS_ENDPOINT: "https://s3.eu-central-003.backblazeb2.com"
  AWS_DEFAULT_REGION: "eu-central-003"
  AWS_S3_FORCE_PATH_STYLE: "true"
  USE_WALG_BACKUP: "true"
  USE_WALG_RESTORE: "true"
  #WAL_S3_BUCKET: "backup-staging-postgres"
  SCOPE: db-postgres
  WALG_S3_PREFIX: s3://backup-staging-postgres/db-postgres

  CLONE_AWS_ENDPOINT: "https://s3.eu-central-003.backblazeb2.com"
  CLONE_AWS_DEFAULT_REGION: "eu-central-003"
  CLONE_AWS_S3_FORCE_PATH_STYLE: "true"
  CLONE_USE_WALG_BACKUP: "true"
  CLONE_USE_WALG_RESTORE: "false"
  #CLONE_WAL_S3_BUCKET: "backup-staging-postgres"
  CLONE_SCOPE: db-postgres
  CLONE_WALG_S3_PREFIX: s3://backup-staging-postgres/db-postgres

This is the postgresql manifest:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: db-postgres
spec:
  teamId: "db"
  volume:
    size: 1Gi
    storageClass: bronze
  numberOfInstances: 1
  users:
    postgres:
    - superuser
    - createdb
    user123:
    - superuser
    - createdb
  databases:
    first: user123
    second: user123
  postgresql:
    version: "13"
  resources:
    requests:
      cpu: 500m
      memory: 512M
    limits:
      cpu: "2"
      memory: 1Gi
  clone:
      uid: "7f42f9d2-608e-4660-9715-f8414181ae18"
      cluster: "db-postgres"
      timestamp: "2021-04-12T18:00:00+01:00"

I have double checked the uid of the "old" cluster is correct. I also made sure that there wasn't any pod of the old cluster running and the postgres CR has been deleted.

Pod logs:

[...]
2021-04-12 16:26:01,690 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALG_UPLOAD_CONCURRENCY
2021-04-12 16:26:01,690 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/USE_WALG_BACKUP
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/USE_WALG_RESTORE
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/WALE_LOG_DESTINATION
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/BACKUP_NUM_TO_RETAIN
2021-04-12 16:26:01,691 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env/TMPDIR
2021-04-12 16:26:02,937 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2021-04-12 16:26:02,952 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-04-12 16:26:02,954 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:03,093 INFO: trying to bootstrap a new cluster
2021-04-12 16:26:03,094 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-db-postgres" python3 /scripts/clone_with_wale.py --recovery-target-time="2021-04-12T18:00:00+01:00"
2021-04-12 16:26:03,136 INFO: Trying s3://backup-staging-postgres/db-postgres for clone
2021-04-12 16:26:13,455 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:13,455 INFO: not healthy enough for leader race
2021-04-12 16:26:13,505 INFO: bootstrap in progress
2021-04-12 16:26:23,454 INFO: Lock owner: None; I am db-postgres-0
2021-04-12 16:26:23,455 INFO: not healthy enough for leader race
2021-04-12 16:26:23,455 INFO: bootstrap in progress
[...]

Can anyone please point me to what my configuration is missing?

timbrd avatar Apr 12 '21 16:04 timbrd

After jumping into the postgres pod and running the restore script, which has already been started as seen in the pod logs (python3 /scripts/clone_with_wale.py --recovery-target-time="2021-04-12T18:00:00+01:00"), the databases have finally been recreated. I didn't need to change my configuration though.

So, why do I have to run the restore script manually?

timbrd avatar Apr 12 '21 17:04 timbrd

Are you trying to create a clone that is called like the source cluster? Choose a different name.

FxKu avatar Apr 23 '21 15:04 FxKu

Are you trying to create a clone that is called like the source cluster? Choose a different name.

Do I really have to use a different name, even if the old postgres cluster is neither running nor defined? And why is the cluster restore working when I run the restore process, which has been started at the container start, manually? The first process somehow is blocked.

timbrd avatar Apr 23 '21 15:04 timbrd

I am also facing the same issue . Could not able to restore from existing running cluster. I follow the same steps given above. @timbrd

  1. Did you apply the clone config map part in the begining while creating a cluster at first time , bcoz you dont have the cluster ID first time?
  2. If you apply cluster manifest with clone config at second time then will your DB recovered , bcoz to me it get crashed?
  3. Even manually i tried with python clone_with_wale.py script" wal-e fetch" fail bcoz your data directory cant be same, and if you try to fetch in tmp location and later manully copy the files in DATA dir then it recover full not upto the recovery_target_time

pawanku2 avatar Jun 08 '21 16:06 pawanku2

It is an old issue but it looks as if you are setting CLONE_USE_WALG_RESTORE: "false" Set it to true and the clone will restore from S3.

CLONE_USE_WALG_RESTORE: "true"

ErikLundJensen avatar Aug 09 '22 09:08 ErikLundJensen