postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Issues restoring from full backup

Open todeb opened this issue 3 months ago • 0 comments

Please ensure you do the following when reporting a bug:

Overview

Unable to recover from full backp stored in S3.

Environment

Please provide the following details:

  • Platform: (Kubernetes, OpenShift, Rancher, GKE, EKS, AKS etc.) Kubernetes
  • Platform Version: (e.g. 1.20.3, 4.7.0) 1.25, 1.28
  • PGO Image Tag: (e.g. ubi8-5.x.y-0) 5.6.0
  • Postgres Version (e.g. 15) 13
  • Storage: (e.g. hostpath, nfs, or the name of your storage class) -

Steps to Reproduce

make full backup, make wals unrecoverable or delete archive

EXPECTED

Able to recover full backup

ACTUAL

pgbackrest info

        full backup: 20250917-094053F
            timestamp start/stop: 2025-09-17 09:40:53 / 2025-09-17 10:29:22
            wal start/stop: 00000159000006F7000000F4 / 00000159000006F80000000B
            database size: 31.5GB, database backup size: 31.5GB
            repo1: backup set size: 7GB, backup size: 7GB

        full backup: 20250917-111302F
            timestamp start/stop: 2025-09-17 11:13:02 / 2025-09-17 12:04:37
            wal start/stop: 00000159000006F80000001F / 00000159000006F800000034
            database size: 31.5GB, database backup size: 31.5GB
            repo1: backup set size: 7GB, backup size: 7GB

failed restore if no archive present:

Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --set=20250917-094053F --stanza=db --pg1-path=/pgdata/pg13 --repo=1 --delta --link-map=pg_wal=/pgdata/pg13_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg13' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2025-09-17 14:55:13.558 GMT [18] LOG:  starting PostgreSQL 13.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2025-09-17 14:55:13.559 GMT [18] LOG:  listening on IPv6 address "::1", port 5432
2025-09-17 14:55:13.559 GMT [18] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-09-17 14:55:13.560 GMT [18] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-09-17 14:55:13.563 GMT [19] LOG:  database system was interrupted; last known up at 2025-09-17 10:19:41 GMT
WARN: repo1: [FileMissingError] unable to load info file '/pgsql/pgsql-syd/repo/archive/db/archive.info' or '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy':
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info' for read
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy' for read
      HINT: archive.info cannot be opened but is required to push/get WAL segments.
      HINT: is archive_command configured correctly in postgresql.conf?
      HINT: has a stanza-create been performed?
      HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
ERROR: [103]: unable to find a valid repository
2025-09-17 14:55:14.707 GMT [19] LOG:  starting archive recovery
WARN: repo1: [FileMissingError] unable to load info file '/pgsql/pgsql-syd/repo/archive/db/archive.info' or '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy':
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info' for read
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy' for read
      HINT: archive.info cannot be opened but is required to push/get WAL segments.
      HINT: is archive_command configured correctly in postgresql.conf?
      HINT: has a stanza-create been performed?
      HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
ERROR: [103]: unable to find a valid repository
WARN: repo1: [FileMissingError] unable to load info file '/pgsql/pgsql-syd/repo/archive/db/archive.info' or '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy':
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info' for read
      FileMissingError: unable to open missing file '/pgsql/pgsql-syd/repo/archive/db/archive.info.copy' for read
      HINT: archive.info cannot be opened but is required to push/get WAL segments.
      HINT: is archive_command configured correctly in postgresql.conf?
      HINT: has a stanza-create been performed?
      HINT: use --no-archive-check to disable archive checks during backup if you have an alternate archiving scheme.
ERROR: [103]: unable to find a valid repository
2025-09-17 14:55:15.023 GMT [19] LOG:  invalid checkpoint record
2025-09-17 14:55:15.023 GMT [19] FATAL:  could not locate required checkpoint record
2025-09-17 14:55:15.023 GMT [19] HINT:  If you are restoring from a backup, touch "/pgdata/pg13/recovery.signal" and add required recovery options.
        If you are not restoring from a backup, try removing the file "/pgdata/pg13/backup_label".
        Be careful: removing "/pgdata/pg13/backup_label" will result in a corrupt cluster if restoring from a backup.
2025-09-17 14:55:15.024 GMT [18] LOG:  startup process (PID 19) exited with exit code 1
2025-09-17 14:55:15.024 GMT [18] LOG:  aborting startup due to startup process failure
2025-09-17 14:55:15.025 GMT [18] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.

failed restore if some wals missing:

+ pgbackrest restore --set=20250917-111302F --type=default --stanza=db --pg1-path=/pgdata/pg13 --repo=1 --delta --link-map=pg_wal=/pgdata/pg13_wal
2025-09-17 13:49:48.151 GMT [19] LOG:  starting PostgreSQL 13.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2025-09-17 13:49:48.152 GMT [19] LOG:  listening on IPv6 address "::1", port 5432
2025-09-17 13:49:48.152 GMT [19] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-09-17 13:49:48.157 GMT [19] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-09-17 13:49:48.170 GMT [20] LOG:  database system was interrupted; last known up at 2025-09-17 11:58:34 GMT
2025-09-17 13:49:51.178 GMT [20] LOG:  restored log file "0000015A.history" from archive
2025-09-17 13:49:51.349 GMT [20] LOG:  starting archive recovery
2025-09-17 13:49:51.536 GMT [20] LOG:  restored log file "0000015A.history" from archive
2025-09-17 13:49:53.192 GMT [20] LOG:  restored log file "00000159000006F800000021" from archive
2025-09-17 13:49:59.407 GMT [20] LOG:  restored log file "00000159000006F80000001F" from archive
2025-09-17 13:49:59.439 GMT [20] FATAL:  requested timeline 346 is not a child of this server's history
2025-09-17 13:49:59.439 GMT [20] DETAIL:  Latest checkpoint is at 6F8/3129B1B8 on timeline 345, but in the history of the requested timeline, the server forked off from that timeline at 6C1/DC336328.
2025-09-17 13:49:59.441 GMT [19] LOG:  startup process (PID 20) exited with exit code 1
2025-09-17 13:49:59.441 GMT [19] LOG:  aborting startup due to startup process failure
2025-09-17 13:49:59.443 GMT [19] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.
2025-09-17 13:49:59.617 GMT [46] LOG:  starting PostgreSQL 13.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2025-09-17 13:49:59.618 GMT [46] LOG:  listening on IPv6 address "::1", port 5432
2025-09-17 13:49:59.618 GMT [46] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2025-09-17 13:49:59.621 GMT [46] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2025-09-17 13:49:59.641 GMT [47] LOG:  database system was interrupted; last known up at 2025-09-17 11:58:34 GMT
2025-09-17 13:50:02.946 GMT [47] LOG:  restored log file "0000015A.history" from archive
2025-09-17 13:50:03.070 GMT [47] LOG:  starting archive recovery
2025-09-17 13:50:03.213 GMT [47] LOG:  restored log file "0000015A.history" from archive
2025-09-17 13:50:04.605 GMT [47] LOG:  restored log file "00000159000006F800000021" from archive
2025-09-17 13:50:10.846 GMT [47] LOG:  restored log file "00000159000006F80000001F" from archive
2025-09-17 13:50:10.876 GMT [47] FATAL:  requested timeline 346 is not a child of this server's history
2025-09-17 13:50:10.876 GMT [47] DETAIL:  Latest checkpoint is at 6F8/3129B1B8 on timeline 345, but in the history of the requested timeline, the server forked off from that timeline at 6C1/DC336328.
2025-09-17 13:50:10.878 GMT [46] LOG:  startup process (PID 47) exited with exit code 1
2025-09-17 13:50:10.878 GMT [46] LOG:  aborting startup due to startup process failure
2025-09-17 13:50:10.881 GMT [46] LOG:  database system is shut down

Logs

UP

Additional Information

I don't need PTR to restore wals, just want to have cluster restored from last full backup.

todeb avatar Sep 17 '25 15:09 todeb