postgres-operator-examples PgBackRest: Restore is never executed

Environment: minikube, kustomize->high-availability postgres-operator-examples version tag: v5.0.0-alpha.4-0

Documented at https://access.crunchydata.com/documentation/postgres-operator/5.1.2/tutorial/disaster-recovery/#perform-a-point-in-time-recovery-pitr

Expected behavior: Point in time backup restored Observed behavior: No backup is ever restored

Configuration:

  backups:
    pgbackrest:
      restore:
        enabled: true
        repoName: repo1
        options:
        - --type=time
        - --target="2022-07-18 08:36:04.204224+00"
      global:
        repo1-retention-full: "3"
        repo1-retention-full-type: count
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-1
      repos:
      - name: repo1
        schedules:
          full: "*/20 * * * 1-5" 
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 1Gi

Steps to reproduce

minikube  start --memory 16384 --cpus 6 --disk-size 80000mb

kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default
kubectl apply -k kustomize/high-availability/

PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \
  -l postgres-operator.crunchydata.com/cluster=hippo-ha,postgres-operator.crunchydata.com/role=master)

kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -- /bin/bash

psql -c "CREATE DATABASE testdb"

pgbench -i -s 200 testdb

psql -c "SELECT current_timestamp, pg_size_pretty(pg_database_size('testdb'))"
       current_timestamp       | pg_size_pretty 
-------------------------------+----------------
 2022-07-18 08:36:04.204224+00 | 2999 MB
(1 row)

exit

#
kubectl exec --stdin -n postgres-operator --tty  hippo-ha-repo-host-0 -- /bin/bash
pgbackrest info

exit

kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -- /bin/bash

psql -c "DROP DATABASE testdb"

exit

# Add timestamp to ha-postgresql.yaml and set enabled=true

kubectl annotate -n postgres-operator postgrescluster hippo-ha --overwrite \
  postgres-operator.crunchydata.com/pgbackrest-restore=id1

Jul 18 '22 09:07 sjstoelting

The problem has been solved. The PostgreSQL configuration needed some adjustments on the WAL settings.

In addition, the database had to be created from the YAML file and not with psql. That resulted in a missing entry in pg_hba.conf.

Aug 28 '22 13:08 sjstoelting

Hello, I wanted to look at this again because, well, I like restores to work cleanly out of the box.

The problem with restores

I was able to replicate a problem with PITR restores after I dropped a database following the steps you outlined here. (That was very helpful, thanks!)

I did the following steps:

create a cluster,
create testdb database,
add some data,
ensure the backup exists,
drop the database,
restore to the PITR before the database was dropped.

And when I tried to connect to the testdb database (that had been dropped and which I expected to be restored), I got an error:

# \c testdb
connection to server on socket "/tmp/postgres/.s.PGSQL.5432" failed: FATAL:  database "testdb" does not exist
DETAIL:  The database subdirectory "base/16436" is missing.

Which was weird because if I did \l I could see testdb listed.

Is the problem restore or drop database?

I tried the same steps on a new cluster, except instead of dropping the database, I only dropped a table. When I restored in that case, that table was there. So it seemed like restores were working, but not for dropped databases.

Around the same time, @andrewlecuyer pointed me to some articles/emails about how database drops are a special case with PITR restore: it seems like the replay is hitting the drop database transaction in terms of clearing out the directory, but not removing the database from the list.

To test this out, I did the following (same as the above steps, with new steps in bold)

create a cluster,
create testdb database,
add some data,
ensure the backup exists,
do some other transactions
drop the database,
restore to the PITR before the database was dropped

And in that case, my database was restored, and the restore stopped short of the other transactions that I was just using as a buffer.

So I think this isn't a problem with the operator, but with postgres

Wait, there's a problem with postgres?

Does this mean that, in postgres, if you drop a database and then try to restore to before then, if you don't have other commits, you run the risk of removing the directory for that database? Is there some recommended way around this case? (Putting that on my "to solve later" pile.

Sep 02 '22 16:09 benjaminjb

Hi Benjamin,

I have seen the problem with the corrupted database, but it was not reproducible, which will increase the "fun" here. But it never occurred with a Patroni cluster on VMs, only with the operator. Therefore I don't believe this is a PostgreSQL problem, neither its a Patroni problem.

I have created two branches in my clone of the operator examples. The code can be found in kustomize/high-availability, changes have been done in ha-postgres.yaml, test-restore.sh contains the script, in /logs are the logs exported after the annotation. If you use them, you have to set backups->pgbackrest->restore->enabled to false again, I left it at the state after the annotation.

In both scenarios I did these steps:

create a database with SQL
executed pgbench
waited until a full backup with the data of pgbench has been finished
executed a SELECT statement to get a timestamp before changing the database
dropped a table,
changed the ha-postgresql.yaml with the previously generated timestamp and set backups->pgbackrest->restore->enabled to true
waited some time
exported the logs from k8s
- kubectl -n postgres-operator logs ${PG_CLUSTER_PRIMARY_POD} -c database > kustomize/high-availability/logs/PG_CLUSTER_PRIMARY_POD.log
- kubectl exec -n postgres-operator -it ${PG_CLUSTER_PRIMARY_POD} -c database -- bash -c "cat pgdata/pg14/log/*.log" > kustomize/high-availability/logs/pg14.log

Scenario One

Without adjusting the PostgreSQL configuration resulted in WAL problems, which is something, that I do understand, the configuration out of the box is useless. https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-1

Scenario Two

I adjusted the PostgreSQL configuration to something more reasonable in ha-postgresql.yaml in patroni->dynamicConfiguration->postgresql-> parameters The deleted table has not been restored. https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-2

I have tested that a lot of, and it is always the same result.

The corrupted database after trying to restore a deleted database is another problem in my humble opinion. Out of ten tests i believe it only happened once or twice, the other times nothing happened at all. It might also be a result of the jWAL problems due to bad configuration.

Sep 03 '22 07:09 sjstoelting

Hello, I've had a chance to go over this again and I have to report, I'm not seeing the same results. For ease of seeing what I did, here's my terminal (with some comments and a little cleaned up):

# Adjust the ha-postgres.yaml to include the full backup every 15 minutes
➜ kubectl apply -k kustomize/high-availability
postgrescluster.postgres-operator.crunchydata.com/hippo-ha created

# Wait for the postgres pods to come up
➜ PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \
  -l postgres-operator.crunchydata.com/cluster=hippo-ha,postgres-operator.crunchydata.com/role=master)
➜ kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -c database -- /bin/bash

bash-4.4$ psql -c "CREATE DATABASE testdb"
CREATE DATABASE
bash-4.4$ pgbench -i -s 200 testdb
dropping old tables...
creating tables...
generating data (client-side)...
20000000 of 20000000 tuples (100%) done (elapsed 53.69 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 192.28 s (drop tables 45.19 s, create tables 0.04 s, client-side generate 54.39 s, vacuum 71.13 s, primary keys 21.53 s).

bash-4.4$ psql -c "SELECT current_timestamp, pg_size_pretty(pg_database_size('testdb'))"
       current_timestamp       | pg_size_pretty 
-------------------------------+----------------
 2022-09-15 15:37:11.870687+00 | 2999 MB
(1 row)

bash-4.4$ psql -d testdb
psql (14.4)
Type "help" for help.

testdb=# \dt+
                                         List of relations
 Schema |       Name       | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+------------------+-------+----------+-------------+---------------+---------+-------------
 public | pgbench_accounts | table | postgres | permanent   | heap          | 2562 MB | 
 public | pgbench_branches | table | postgres | permanent   | heap          | 40 kB   | 
 public | pgbench_history  | table | postgres | permanent   | heap          | 0 bytes | 
 public | pgbench_tellers  | table | postgres | permanent   | heap          | 120 kB  | 
(4 rows)

testdb=# \q

# Make sure the cluster is backed up and then drop tables; note, I'm dropping a few here because the pgbench_tellers table has been 0 bytes whenever I've looked, so I wanted to make sure I could restore a table with some data.
bash-4.4$ psql -c "DROP TABLE public.pgbench_history" testdb
DROP TABLE
bash-4.4$ psql -c "DROP TABLE public.pgbench_tellers" testdb
DROP TABLE
bash-4.4$ psql -c "DROP TABLE public.pgbench_accounts" testdb
DROP TABLE

# Confirm that the tables have been dropped
bash-4.4$ psql -d testdb
psql (14.4)
Type "help" for help.

testdb=# \dt+
                                        List of relations
 Schema |       Name       | Type  |  Owner   | Persistence | Access method | Size  | Description 
--------+------------------+-------+----------+-------------+---------------+-------+-------------
 public | pgbench_branches | table | postgres | permanent   | heap          | 40 kB | 
(1 row)

testdb=# \q
bash-4.4$ exit

# Adjust the ha-postgres.yaml with the time for the PITR and then
➜ kubectl apply -k kustomize/high-availability
postgrescluster.postgres-operator.crunchydata.com/hippo-ha configured
➜ kubectl annotate -n postgres-operator postgrescluster hippo-ha --overwrite \  postgres-operator.crunchydata.com/pgbackrest-restore=id1
postgrescluster.postgres-operator.crunchydata.com/hippo-ha annotated

# Wait for the restore to finish and the postgres pods to come back up and be healthy and then
➜ kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -c database -- psql -d testdb

psql (14.4)
Type "help" for help.

testdb=# \dt+
                                         List of relations
 Schema |       Name       | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+------------------+-------+----------+-------------+---------------+---------+-------------
 public | pgbench_accounts | table | postgres | permanent   | heap          | 2562 MB | 
 public | pgbench_branches | table | postgres | permanent   | heap          | 40 kB   | 
 public | pgbench_history  | table | postgres | permanent   | heap          | 0 bytes | 
 public | pgbench_tellers  | table | postgres | permanent   | heap          | 120 kB  | 
(4 rows)

All the tables have been restored and I didn't change any postgres settings to get this result.

(I will note that our out-of-box settings are not suitable for all uses, and I ran into a problem with this where pgbench would clobber my postgres pod and shut down the connection; and we do have a feature on our future roadmap for providing non-default, autotuned settings for deployments.)

But I can't understand what might be different between your setup and mine (especially given the commands you shared) that is giving you a consistent bad restore experience.

Sep 15 '22 16:09 benjaminjb

I have been able to reproduce your results with the current version, 5.20.

I tested again the database restore by type-time. In https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-3 I created the database manually, in https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-4 I created the database with the YAML configuration.

In both cases I ended repeatedly with a missing database sub-directory. And that is not a PostgreSQL failure, it is a failed restore. The table pg_catalog.pg_database has the entry with the necessary information about the database, but the folder containing the files have not been restored.

Sep 26 '22 16:09 sjstoelting

@sjstoelting please see the following question/response on the PostgreSQL mailing list, which indicates that the behavior you are seeing is expected for any PostgreSQL restore:

https://www.postgresql.org/message-id/flat/459E32DC.3030402%40mailnetwork.co.uk

Specifically, Tom mentions the following:

...the only operations that have non-rollbackable side effects are CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. For any of these, you'd end up with inconsistent state if you try to stop replay just before the commit record.

In other words, the inability to rollback from a DROP DATABASE is a constraint/limitation within PostgreSQL itself. Therefore, since you are performing a DROP DATABASE in both of the examples/branches you referenced, the behavior you are seeing is expected.

Looking a bit further up in that thread, Tom provides a bit more detail:

...for normal database operations, stopping just short of the commit of the transaction is enough to ensure that the transaction has no effect. But for the XLOG_DBASE_DROP record, not so --- replaying that means "rm -rf base/whatever". So you've got to make sure the replay stops before it reaches that record, and that means you need a stop time <= the commit time of some prior transaction.

This means you will need to stop before the DATABASE DROP for any PostgreSQL restore & recovery to get your desired behavior. In other words, this is standard/expected PostgreSQL recovery behavior, and therefore outside of the control of PGO and/or pgBackRest.

Sep 29 '22 20:09 andrewlecuyer

@sjstoelting just wanted to follow-up per my last response above.

Oct 27 '22 14:10 andrewlecuyer

postgres-operator-examples postgres-operator-examples copied to clipboard

PgBackRest: Restore is never executed

Scenario One

Scenario Two

postgres-operator-examples
postgres-operator-examples copied to clipboard