pg_auto_failover
pg_auto_failover copied to clipboard
Implement support for WAL-G
To implement HA we need automated failover and also Disaster Recovery for the availability of the data. With
Postgres that means archiving. Then, archiving intersects with auto-failover in multiple ways, including how to
create a standby node from the archives, using restore_command
to enhance the reliability of the whole system,
allowing standby/secondary nodes to archive WAL files with archive_mode = 'always'
, and also continuing to
maintain the archives during and after a failover.
This PR implements the following 3 commands as a starter-kit for WAL-G support/integration:
-
pg_autoctl create archiver-policy
-
pg_autoctl archive wal
-
pg_autoctl restore wal
More is needed later, in particular:
-
pg_autoctl archive pgdata
-
pg_autoctl restore pgdata
- automated integration of restoring pgdata from the archives when creating a standby node
- integrated scheduler to archive new base backups and purge old ones following the retention policy
Given the size of the current PR, it might be better to focus on this development in several stages. This PR focuses on the WAL archiving, the base backup archiving may be implemented later on-top of it.
Finally, the design has been made in a way that allows support for multiple archive methods, even though at the moment only WAL-G support is implemented. Some wrapper work is required for each new method, but should be pretty easy. The main advantage of maintaining a wrapper is to allow for archive_mode = 'always'
thanks to handling WAL file metadata on the monitor. Also, maintaining the configuration of the archiving method on the monitor makes it trivial to share it with all the nodes, even when the configuration needs updating.
Not read the whole patch but: Caution with archive_mode = 'always' , you may need different backup repositories (one for each PostgreSQL instance) because this bug may not be fixed : wal from standbys contains same logical information from the ones from the primary, but checksums may differ.
see: https://pgbackrest.org/configuration.html#section-backup/option-archive-mode-check
Hey @DimCitus, Is the goal of integrating Wal-g with pg_auto_failover still valid?
I'm considering wal-g for a new Citus 11 cluster where it will also use pg_auto_failover now that it's supporting Citus with 2.0 release.
Thanks for the great work on pg_auto_failover! Best,
Hi @raivil ; I still want to add support for archiving in pg_auto_failover yes. I have no idea of when I will be able to get back on this work though, so if you wanted to contribute, please consider it!