til icon indicating copy to clipboard operation
til copied to clipboard

How we used delayed replication for disaster recovery with PostgreSQL

Open xluffy opened this issue 5 months ago • 0 comments

It is possible to configure streaming replication with a delay usingrecovery_min_apply_delay. However, there are a few pitfalls regarding replication slots, hot standby feedback, and others that one needs to be aware of. In our case, we avoid them by replicating from the WAL archive instead of using streaming replication.

https://about.gitlab.com/blog/2019/02/13/delayed-replication-for-disaster-recovery-with-postgresql/

recovery_min_apply_delay (integer)

By default, a standby server restores WAL records from the primary as soon as possible. It may be useful to have a time-delayed copy of the data, offering opportunities to correct data loss errors. This parameter allows you to delay recovery by a fixed period of time, measured in milliseconds if no unit is specified. For example, if you set this parameter to 5min, the standby will replay each transaction commit only when the system time on the standby is at least five minutes past the commit time reported by the master.

It is possible that the replication delay between servers exceeds the value of this parameter, in which case no delay is added. Note that the delay is calculated between the WAL time stamp as written on master and the current time on the standby. Delays in transfer because of network lag or cascading replication configurations may reduce the actual wait time significantly. If the system clocks on master and standby are not synchronized, this may lead to recovery applying records earlier than expected; but that is not a major issue because useful settings of this parameter are much larger than typical time deviations between servers.

The delay occurs only on WAL records for transaction commits. Other records are replayed as quickly as possible, which is not a problem because MVCC visibility rules ensure their effects are not visible until the corresponding commit record is applied.

The delay occurs once the database in recovery has reached a consistent state, until the standby is promoted or triggered. After that the standby will end recovery without further waiting.

This parameter is intended for use with streaming replication deployments; however, if the parameter is specified it will be honored in all cases. hot_standby_feedback will be delayed by use of this feature which could lead to bloat on the master; use both together with care.

Warning

xluffy avatar Sep 20 '24 03:09 xluffy