Dimitri Fontaine
Dimitri Fontaine
These instructions assume you have `$VERSION`, `$PROJECT`, and `$REPO` environment variables set in your shell (e.g. `1.4`, `pgautofailover`, and `pg_auto_failover`). With those set, code from most steps can be copy-pasted....
Using the environment variable allows to omit the replication password from the logs, though messes with next connection attempts. We should either call unsetenv() once the pg_basebackup command has finished,...
We don't actually need to have a password for this user, given that the monitor doesn't authenticate when doing health checks. Not having a password to manage also makes it...
To implement HA we need automated failover and also Disaster Recovery for the availability of the data. With Postgres that means archiving. Then, archiving intersects with auto-failover in multiple ways,...
When a node is ongoing maintenance, the operator can use our maintenance state to avoid our automation to continue driving the node. We're asked to get pg_auto_failover out of the...
The existing PGDATA might have come from a former primary that's not in the new timeline and where pg_rewind is needed to go before the fork LSN. Also, it could...
That way when the node shuts down, the group FSM has done the necessary steps already. If the node was a primary, another node has been elected to take over...
Both systemd and Kubernetes use a shutdown sequence that sends a SIGTERM signal then later a stronger one. See https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination for instance, where the default is to provide pods with...
Use environment variables rather than very long command lines where it makes sense, making it easier to see that we're doing the same thing in the three Postgres nodes.
This is a work in progress. The idea is to add support for Disaster Recovery in pg_auto_failover. This is not ready for review, it's mostly just a remote backup of...