postgres-operator-examples
postgres-operator-examples copied to clipboard
PgBackRest: Restore is never executed
Environment: minikube, kustomize->high-availability postgres-operator-examples version tag: v5.0.0-alpha.4-0
Documented at https://access.crunchydata.com/documentation/postgres-operator/5.1.2/tutorial/disaster-recovery/#perform-a-point-in-time-recovery-pitr
Expected behavior: Point in time backup restored Observed behavior: No backup is ever restored
Configuration:
backups:
pgbackrest:
restore:
enabled: true
repoName: repo1
options:
- --type=time
- --target="2022-07-18 08:36:04.204224+00"
global:
repo1-retention-full: "3"
repo1-retention-full-type: count
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-1
repos:
- name: repo1
schedules:
full: "*/20 * * * 1-5"
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
Steps to reproduce
minikube start --memory 16384 --cpus 6 --disk-size 80000mb
kubectl apply -k kustomize/install/namespace
kubectl apply --server-side -k kustomize/install/default
kubectl apply -k kustomize/high-availability/
PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \
-l postgres-operator.crunchydata.com/cluster=hippo-ha,postgres-operator.crunchydata.com/role=master)
kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -- /bin/bash
psql -c "CREATE DATABASE testdb"
pgbench -i -s 200 testdb
psql -c "SELECT current_timestamp, pg_size_pretty(pg_database_size('testdb'))"
current_timestamp | pg_size_pretty
-------------------------------+----------------
2022-07-18 08:36:04.204224+00 | 2999 MB
(1 row)
exit
#
kubectl exec --stdin -n postgres-operator --tty hippo-ha-repo-host-0 -- /bin/bash
pgbackrest info
exit
kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -- /bin/bash
psql -c "DROP DATABASE testdb"
exit
# Add timestamp to ha-postgresql.yaml and set enabled=true
kubectl annotate -n postgres-operator postgrescluster hippo-ha --overwrite \
postgres-operator.crunchydata.com/pgbackrest-restore=id1
The problem has been solved. The PostgreSQL configuration needed some adjustments on the WAL settings.
In addition, the database had to be created from the YAML file and not with psql. That resulted in a missing entry in pg_hba.conf.
Hello, I wanted to look at this again because, well, I like restores to work cleanly out of the box.
The problem with restores
I was able to replicate a problem with PITR restores after I dropped a database following the steps you outlined here. (That was very helpful, thanks!)
I did the following steps:
- create a cluster,
- create testdb database,
- add some data,
- ensure the backup exists,
- drop the database,
- restore to the PITR before the database was dropped.
And when I tried to connect to the testdb database (that had been dropped and which I expected to be restored), I got an error:
# \c testdb
connection to server on socket "/tmp/postgres/.s.PGSQL.5432" failed: FATAL: database "testdb" does not exist
DETAIL: The database subdirectory "base/16436" is missing.
Which was weird because if I did \l
I could see testdb listed.
Is the problem restore or drop database?
I tried the same steps on a new cluster, except instead of dropping the database, I only dropped a table. When I restored in that case, that table was there. So it seemed like restores were working, but not for dropped databases.
Around the same time, @andrewlecuyer pointed me to some articles/emails about how database drops are a special case with PITR restore: it seems like the replay is hitting the drop database
transaction in terms of clearing out the directory, but not removing the database from the list.
To test this out, I did the following (same as the above steps, with new steps in bold)
- create a cluster,
- create testdb database,
- add some data,
- ensure the backup exists,
- do some other transactions
- drop the database,
- restore to the PITR before the database was dropped
And in that case, my database was restored, and the restore stopped short of the other transactions that I was just using as a buffer.
So I think this isn't a problem with the operator, but with postgres
Wait, there's a problem with postgres?
Does this mean that, in postgres, if you drop a database and then try to restore to before then, if you don't have other commits, you run the risk of removing the directory for that database? Is there some recommended way around this case? (Putting that on my "to solve later" pile.
Hi Benjamin,
I have seen the problem with the corrupted database, but it was not reproducible, which will increase the "fun" here. But it never occurred with a Patroni cluster on VMs, only with the operator. Therefore I don't believe this is a PostgreSQL problem, neither its a Patroni problem.
I have created two branches in my clone of the operator examples. The code can be found in kustomize/high-availability, changes have been done in ha-postgres.yaml, test-restore.sh contains the script, in /logs are the logs exported after the annotation. If you use them, you have to set backups->pgbackrest->restore->enabled to false again, I left it at the state after the annotation.
In both scenarios I did these steps:
- create a database with SQL
- executed pgbench
- waited until a full backup with the data of pgbench has been finished
- executed a SELECT statement to get a timestamp before changing the database
- dropped a table,
- changed the ha-postgresql.yaml with the previously generated timestamp and set backups->pgbackrest->restore->enabled to true
- waited some time
- exported the logs from k8s
- kubectl -n postgres-operator logs ${PG_CLUSTER_PRIMARY_POD} -c database > kustomize/high-availability/logs/PG_CLUSTER_PRIMARY_POD.log
- kubectl exec -n postgres-operator -it ${PG_CLUSTER_PRIMARY_POD} -c database -- bash -c "cat pgdata/pg14/log/*.log" > kustomize/high-availability/logs/pg14.log
Scenario One
Without adjusting the PostgreSQL configuration resulted in WAL problems, which is something, that I do understand, the configuration out of the box is useless. https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-1
Scenario Two
I adjusted the PostgreSQL configuration to something more reasonable in ha-postgresql.yaml in patroni->dynamicConfiguration->postgresql-> parameters The deleted table has not been restored. https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-2
I have tested that a lot of, and it is always the same result.
The corrupted database after trying to restore a deleted database is another problem in my humble opinion. Out of ten tests i believe it only happened once or twice, the other times nothing happened at all. It might also be a result of the jWAL problems due to bad configuration.
Hello, I've had a chance to go over this again and I have to report, I'm not seeing the same results. For ease of seeing what I did, here's my terminal (with some comments and a little cleaned up):
# Adjust the ha-postgres.yaml to include the full backup every 15 minutes
➜ kubectl apply -k kustomize/high-availability
postgrescluster.postgres-operator.crunchydata.com/hippo-ha created
# Wait for the postgres pods to come up
➜ PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -n postgres-operator -o name \
-l postgres-operator.crunchydata.com/cluster=hippo-ha,postgres-operator.crunchydata.com/role=master)
➜ kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -c database -- /bin/bash
bash-4.4$ psql -c "CREATE DATABASE testdb"
CREATE DATABASE
bash-4.4$ pgbench -i -s 200 testdb
dropping old tables...
creating tables...
generating data (client-side)...
20000000 of 20000000 tuples (100%) done (elapsed 53.69 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 192.28 s (drop tables 45.19 s, create tables 0.04 s, client-side generate 54.39 s, vacuum 71.13 s, primary keys 21.53 s).
bash-4.4$ psql -c "SELECT current_timestamp, pg_size_pretty(pg_database_size('testdb'))"
current_timestamp | pg_size_pretty
-------------------------------+----------------
2022-09-15 15:37:11.870687+00 | 2999 MB
(1 row)
bash-4.4$ psql -d testdb
psql (14.4)
Type "help" for help.
testdb=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------------------+-------+----------+-------------+---------------+---------+-------------
public | pgbench_accounts | table | postgres | permanent | heap | 2562 MB |
public | pgbench_branches | table | postgres | permanent | heap | 40 kB |
public | pgbench_history | table | postgres | permanent | heap | 0 bytes |
public | pgbench_tellers | table | postgres | permanent | heap | 120 kB |
(4 rows)
testdb=# \q
# Make sure the cluster is backed up and then drop tables; note, I'm dropping a few here because the pgbench_tellers table has been 0 bytes whenever I've looked, so I wanted to make sure I could restore a table with some data.
bash-4.4$ psql -c "DROP TABLE public.pgbench_history" testdb
DROP TABLE
bash-4.4$ psql -c "DROP TABLE public.pgbench_tellers" testdb
DROP TABLE
bash-4.4$ psql -c "DROP TABLE public.pgbench_accounts" testdb
DROP TABLE
# Confirm that the tables have been dropped
bash-4.4$ psql -d testdb
psql (14.4)
Type "help" for help.
testdb=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------------------+-------+----------+-------------+---------------+-------+-------------
public | pgbench_branches | table | postgres | permanent | heap | 40 kB |
(1 row)
testdb=# \q
bash-4.4$ exit
# Adjust the ha-postgres.yaml with the time for the PITR and then
➜ kubectl apply -k kustomize/high-availability
postgrescluster.postgres-operator.crunchydata.com/hippo-ha configured
➜ kubectl annotate -n postgres-operator postgrescluster hippo-ha --overwrite \ postgres-operator.crunchydata.com/pgbackrest-restore=id1
postgrescluster.postgres-operator.crunchydata.com/hippo-ha annotated
# Wait for the restore to finish and the postgres pods to come back up and be healthy and then
➜ kubectl exec --stdin -n postgres-operator --tty "${PG_CLUSTER_PRIMARY_POD}" -c database -- psql -d testdb
psql (14.4)
Type "help" for help.
testdb=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+------------------+-------+----------+-------------+---------------+---------+-------------
public | pgbench_accounts | table | postgres | permanent | heap | 2562 MB |
public | pgbench_branches | table | postgres | permanent | heap | 40 kB |
public | pgbench_history | table | postgres | permanent | heap | 0 bytes |
public | pgbench_tellers | table | postgres | permanent | heap | 120 kB |
(4 rows)
All the tables have been restored and I didn't change any postgres settings to get this result.
(I will note that our out-of-box settings are not suitable for all uses, and I ran into a problem with this where pgbench would clobber my postgres pod and shut down the connection; and we do have a feature on our future roadmap for providing non-default, autotuned settings for deployments.)
But I can't understand what might be different between your setup and mine (especially given the commands you shared) that is giving you a consistent bad restore experience.
I have been able to reproduce your results with the current version, 5.20.
I tested again the database restore by type-time. In https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-3 I created the database manually, in https://github.com/sjstoelting/postgres-operator-examples/tree/scenario-4 I created the database with the YAML configuration.
In both cases I ended repeatedly with a missing database sub-directory. And that is not a PostgreSQL failure, it is a failed restore. The table pg_catalog.pg_database has the entry with the necessary information about the database, but the folder containing the files have not been restored.
@sjstoelting please see the following question/response on the PostgreSQL mailing list, which indicates that the behavior you are seeing is expected for any PostgreSQL restore:
https://www.postgresql.org/message-id/flat/459E32DC.3030402%40mailnetwork.co.uk
Specifically, Tom mentions the following:
...the only operations that have non-rollbackable side effects are CREATE/DROP DATABASE and CREATE/DROP TABLESPACE. For any of these, you'd end up with inconsistent state if you try to stop replay just before the commit record.
In other words, the inability to rollback from a DROP DATABASE
is a constraint/limitation within PostgreSQL itself. Therefore, since you are performing a DROP DATABASE
in both of the examples/branches you referenced, the behavior you are seeing is expected.
Looking a bit further up in that thread, Tom provides a bit more detail:
...for normal database operations, stopping just short of the commit of the transaction is enough to ensure that the transaction has no effect. But for the XLOG_DBASE_DROP record, not so --- replaying that means "rm -rf base/whatever". So you've got to make sure the replay stops before it reaches that record, and that means you need a stop time <= the commit time of some prior transaction.
This means you will need to stop before the DATABASE DROP
for any PostgreSQL restore & recovery to get your desired behavior. In other words, this is standard/expected PostgreSQL recovery behavior, and therefore outside of the control of PGO and/or pgBackRest.
@sjstoelting just wanted to follow-up per my last response above.