postgres-operator Ability to delete and re-order pgbackrest repo's

Overview

Provide the ability to modify pgbackrest repo's after creation.

Use Case

I have two repos currently, one PV based (repo1), 1 Azure based (repo2), I want to get rid of the PV based one. Currently if I try and change them Postgres refuses to start.

Desired Behavior

pgbackrest repos are all removed at boot and new repo is created with new repo settings

Environment

Tell us about your environment:

Please provide the following details:

Platform: AKS
Platform Version: 1.21.7
PGO Image Tag: ubi8-5.1.1-0
Postgres Version 13
Storage: Azure blob storage
Number of Postgres clusters: 1

Jun 25 '22 14:06 trevor-primer

@trevor-primer it should be possible to modify your repos after cluster creation.

Can you provide a copy of your operator logs after attempting to modify the repos? Any logs from the various PG instance Pods, etc. that are unable to start would be great as well.

Additionally, if you could provide insight into the exact change you made (including an example PostgresCluster spec), that would be great as well. To clarify the change you did make, did you simply remove repo1 from your spec, leaving only repo2?

Jun 27 '22 14:06 andrewlecuyer

@trevor-primer just following-up to see if you're able to provide the additional details, logs, etc. requested in my last message.

Thanks!

Jul 13 '22 04:07 andrewlecuyer

I will try and reproduce again this week and paste operator logs. Sorry for the delay. Thanks!

Jul 13 '22 14:07 trevor-primer

Postgres pod errors

2022-07-25 15:02:28,876 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-07-25 15:02:28,888 WARNING: Postgresql is not running.
2022-07-25 15:02:28,888 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:02:28,892 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7124319653976735831
  Database cluster state: shut down
  pg_control last modified: Mon Jul 25 15:02:16 2022
  Latest checkpoint location: 0/8000028
  Latest checkpoint's REDO location: 0/8000028
  Latest checkpoint's REDO WAL file: 000000020000000000000008
  Latest checkpoint's TimeLineID: 2
  Latest checkpoint's PrevTimeLineID: 2
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:624
  Latest checkpoint's NextOID: 19855
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 478
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Jul 25 15:02:16 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: a4f81d6519d75fb6c91d547082d2c38386ff5e4e107bd3ca3d7ba4b647926d3e

2022-07-25 15:02:28,901 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:02:29,352 INFO: starting as a secondary
2022-07-25 15:02:29,516 INFO: postmaster pid=94
2022-07-25 15:02:29.520 UTC [94] LOG:  pgaudit extension initialized
/tmp/postgres:5432 - no response
2022-07-25 15:02:29.533 UTC [94] LOG:  redirecting log output to logging collector process
2022-07-25 15:02:29.533 UTC [94] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2022-07-25 15:02:39,384 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:02:39,384 INFO: not healthy enough for leader race
2022-07-25 15:02:39,591 INFO: restarting after failure in progress
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2022-07-25 15:02:49,383 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:02:49,384 INFO: not healthy enough for leader race
2022-07-25 15:02:49,384 INFO: restarting after failure in progress
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2022-07-25 15:02:59,383 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:02:59,384 INFO: not healthy enough for leader race
2022-07-25 15:02:59,384 INFO: restarting after failure in progress
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2022-07-25 15:03:09,383 INFO: Lock owner: None; I am config-service-repotest-pgha1-jbrz-0
2022-07-25 15:03:09,384 INFO: not healthy enough for leader race
2022-07-25 15:03:09,384 INFO: restarting after failure in progress

Operator logs

time="2022-07-25T15:13:17Z" level=debug msg="replaced configuration" file="internal/patroni/api.go:149" func=patroni.Executor.ReplaceConfiguration name=config-service-repotest namespace=config-service-repotest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster stderr= stdout="Not changed\n" version=5.1.1-0
time="2022-07-25T15:13:18Z" level=debug msg="reconciled instance" file="internal/controller/postgrescluster/instance.go:1129" func="postgrescluster.(*Reconciler).reconcileInstance" instance=config-service-repotest-pgha1-jbrz name=config-service-repotest namespace=config-service-repotest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0
time="2022-07-25T15:13:18Z" level=debug msg="reconciled instance set" file="internal/controller/postgrescluster/instance.go:1025" func="postgrescluster.(*Reconciler).scaleUpInstances" instance-set=pgha1 name=config-service-repotest namespace=config-service-repotest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0
time="2022-07-25T15:13:18Z" level=debug msg="skipping SSH reconciliation, no repo hosts configured" file="internal/controller/postgrescluster/pgbackrest.go:1862" func="postgrescluster.(*Reconciler).reconcilePGBackRestConfig" name=config-service-repotest namespace=config-service-repotest reconcileResource=repoConfig reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0
time="2022-07-25T15:13:18Z" level=debug msg="Could not find a pod with a writable database container." file="internal/controller/postgrescluster/postgres.go:729" func="postgrescluster.(*Reconciler).reconcileDatabaseInitSQL" name=config-service-repotest namespace=config-service-repotest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0
time="2022-07-25T15:13:18Z" level=debug msg="reconciled cluster" file="internal/controller/postgrescluster/controller.go:313" func="postgrescluster.(*Reconciler).Reconcile" name=config-service-repotest namespace=config-service-repotest reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.1-0

Tried two scenarios

Deleted repo1 and and left Azure as repo2
Deleted repo1 and renamed Azure as repo1

Same result for both

Here is the config I started with

  backups:
    pgbackrest:
      configuration:
      - secret:
          name: pgo-azure-creds
      global:
        repo1-path: /
        repo1-retention-full: "14"
        repo1-retention-full-type: time
      manual:
        options:
        - --type=full
        repoName: repo2
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 1500Gi
      - name: repo2
        azure:
          container: config-service-backups-repotest
        schedules:
          full: "0 1 * * *"
          incremental: "0 */8 * * *"

Jul 25 '22 15:07 trevor-primer

@trevor-primer, I am unable to reproduce the behavior you've described... Can you send the manifest you are using before and after deleting the repo so we know exactly what you are changing? Can you try using the latest PGO, pgBackRest, etc, and see if you still see the issue?

Oct 05 '22 18:10 dsessler7

Closing as unable to reproduce.

Oct 25 '22 14:10 andrewlecuyer