postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

stanza creates two system-ids when using the same bucket to bootstrap two different PG clusters

Open AJB78 opened this issue 1 year ago • 1 comments

What is the problem?

When initializing a repo, Crunchy is ignoring pgbackrest stanza-create errors and initialize a new database system-id inside an S3 bucket which already has a repo initialized.

The problem

pgbackrest stanza-create --stanza="${stanza}" || pgbackrest stanza-upgrade --stanza="${stanza}"
  • code reference
  • the error from the first command should have been raised to Crunchy and cluster initialization should have been stopped with the following error: backup and archive info files exist but do not match the database
  • most likely the error was introduced in https://github.com/CrunchyData/postgres-operator/commit/61b9728e73d8039f5b17aee3d7ff01015a6df9ea (cc @cbandy)

How to reproduce?

  1. Create one PG cluster and configure repo2 to point to S3
  2. Initiate a full backup to repo2
  3. Delete the first PG cluster and then create a new one, having the same S3 configured as repo2
  4. The cluster will be setup correctly, but normally it should raise an error because the S3 bucket already contains a stanza

Below I've attached some file to illustrate the error:

  • backup.info
[db]
db-catalog-version=202307071
db-control-version=1300
db-id=3
db-system-id=7449396375125397577
db-version="16"

[db:history]
1={"db-catalog-version":202307071,"db-control-version":1300,"db-system-id":7449035064254382164,"db-version":"16"}
2={"db-catalog-version":202307071,"db-control-version":1300,"db-system-id":7449383438968672363,"db-version":"16"}
3={"db-catalog-version":202307071,"db-control-version":1300,"db-system-id":7449396375125397577,"db-version":"16"}
  • /pgdata/pgbackrest/log/db-stanza-create.log
-------------------PROCESS START-------------------
2024-12-17 15:02:37.685 P00   INFO: stanza-create command begin 2.53.1: --exec-id=146-d2552cd2 --log-level-console=info --log-level-file=info --log-path=/pgdata/pgbackrest/log --pg1-path=/pgdata/pg16 --pg1-port=5432 --pg1-socket-path=/tmp/postgres --repo1-host=main-db-pg-repo-host-0.main-db-pg-pods.ns.svc.cluster.local. --repo1-host-ca-file=/etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt --repo1-host-cert-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.crt --repo1-host-key-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.key --repo1-host-type=tls --repo1-host-user=postgres --repo1-path=/pgbackrest/repo1 --repo2-path=/pgbackrest/repo2 --repo2-s3-bucket=kubernetes-automatic-deploy-db-backups --repo2-s3-endpoint=https://10.XX.129.50:8544/ --repo2-s3-key=<redacted> --repo2-s3-key-secret=<redacted> --repo2-s3-region=us-east-1 --repo2-s3-uri-style=path --repo2-storage-ca-file=/etc/pgbackrest/conf.d/repo2-server-ca.crt --repo2-type=s3 --stanza=db
2024-12-17 15:02:38.330 P00   INFO: stanza-create for stanza 'db' on repo1
2024-12-17 15:02:38.624 P00   INFO: stanza-create for stanza 'db' on repo2
2024-12-17 15:02:38.646 P00  ERROR: [028]: backup and archive info files exist but do not match the database
                                    HINT: is this the correct stanza?
                                    HINT: did an error occur during stanza-upgrade?
2024-12-17 15:02:38.658 P00   INFO: stanza-create command end: aborted with exception [028]
  • /pgdata/pgbackrest/log/db-stanza-upgrade.log
-------------------PROCESS START-------------------
2024-12-17 15:02:38.667 P00   INFO: stanza-upgrade command begin 2.53.1: --exec-id=161-c77a07f7 --log-level-console=info --log-level-file=info --log-path=/pgdata/pgbackrest/log --pg1-path=/pgdata/pg16 --pg1-port=5432 --pg1-socket-path=/tmp/postgres --repo1-host=main-db-pg-repo-host-0.main-db-pg-pods.ns.svc.cluster.local. --repo1-host-ca-file=/etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt --repo1-host-cert-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.crt --repo1-host-key-file=/etc/pgbackrest/conf.d/~postgres-operator/client-tls.key --repo1-host-type=tls --repo1-host-user=postgres --repo1-path=/pgbackrest/repo1 --repo2-path=/pgbackrest/repo2 --repo2-s3-bucket=kubernetes-automatic-deploy-db-backups --repo2-s3-endpoint=https://10.XX.129.50:8544/ --repo2-s3-key=<redacted> --repo2-s3-key-secret=<redacted> --repo2-s3-region=us-east-1 --repo2-s3-uri-style=path --repo2-storage-ca-file=/etc/pgbackrest/conf.d/repo2-server-ca.crt --repo2-type=s3 --stanza=db
2024-12-17 15:02:39.272 P00   INFO: stanza-upgrade for stanza 'db' on repo1
2024-12-17 15:02:39.296 P00   INFO: stanza 'db' on repo1 is already up to date
2024-12-17 15:02:39.297 P00   INFO: stanza-upgrade for stanza 'db' on repo2
2024-12-17 15:02:39.398 P00   INFO: stanza-upgrade command end: completed successfully (732ms)

The second problem

  • when trying o bootstrap a new cluster from repo2 which has been corupted by the above error, we're getting:
RROR: [075]: the latest backup set found '20241217-135215F' is from a prior version of PostgreSQL
20
24-12-17T15:34:11.002221975Z                                     HINT: was a backup created after the stanza-upgrade?
2024-12-17T15:34:11.002226189Z                                     HINT: specify --set or --type=time/lsn to restore from a prior version of PostgreSQL.
  • Question: is there any solution on how we can remove the second database system-id and fix repo2 so we can continue the bootstraping process?

Thank you very much!

AJB78 avatar Dec 17 '24 15:12 AJB78

Thanks for this submission @AJB78, I agree this is a change in behavior that requires some attention. I'm therefore going to discuss this further with the engineering team.

andrewlecuyer avatar Feb 12 '25 16:02 andrewlecuyer