postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Custom Readiness or Startupprobes

Open jbierboms opened this issue 8 months ago • 2 comments

Overview

We aim to mitigate unnecessary outages during Kubernetes node updates or drains by specifying custom readiness or startup probes for a PostgresCluster.

Background

In this particular case we used a PostgresCluster with replica 2.

The Node running the replica was drained and therefor a new pod was created on a different node. Because of some infrastructure-related problems this move did take some time. This resulted into a situation where the replica wasn't able to restart immediately and had to some archive restoring. However the replica already had status "ContainersReady" (during the restore), which allowed the master pod to be moved from its node. During that time we saw an outage until the master was ready again, the replica also had the problem of logging "following a different leader because i am not the healthiest node" until the master was ready.

Use Case

To fill this gap, we propose implementing custom readiness or startup probes for PostgresCluster, ensuring that the system can accurately determine when the replica is fully functional and ready to handle traffic. This will prevent unnecessary outages and reduce the impact of node updates or drains on the cluster's overall availability.

Environment

  • Platform: Kubernetes
  • Number of Postgres clusters: 2

jbierboms avatar Apr 15 '25 09:04 jbierboms

Hi @jbierboms! Thanks for reaching out.

Where this gets a bit messy is that a replica can accept read-only queries, even while it is replaying WAL. This means a replica does not need to be fully caught up with the primary to be considered "ready".

Per your following comment, did you drain the node with the primary instance once you saw the replica was ready?

...which allowed the master pod to be moved from its node

Thinking of options here, one would be to manually trigger a switchover to the replica prior to draining the node with the primary instance, as described here:

https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/cluster-management/administrative-tasks#changing-the-primary

You would then drain the node with the primary instance only once you've confirmed the switchover completed successfully (since a successful switchover would mean the replica has properly caught up to the primary, etc.). And if the switchover does not complete, you would simply wait for the replica to catchup to the primary before trying again.

andrewlecuyer avatar Apr 29 '25 14:04 andrewlecuyer

Hi @andrewlecuyer,

thanks for the reply. The node drain wasn't started manually, but was triggered by an automated Kubernetes node update. This involved gradually adding new nodes with the new Kubernetes version to the cluster and decommissioning the old ones. We don't really have control over this process, and therefore we rely on the "ready" status of the pods. As a current workaround, we're using a sidecar with a custom readiness probe that communicates with Patroni and waits for the replica to be in the 'streaming' state (respective master in state 'running').

jbierboms avatar Apr 29 '25 14:04 jbierboms