governor icon indicating copy to clipboard operation
governor copied to clipboard

How should Postgres behave if etcd is unavailable / unpredictable?

Open Winslett opened this issue 10 years ago • 2 comments

Some of the scenarios to consider are:

  • governor for Postgres primary cannot communicate with etcd, but rest of cluster can
  • no governors in Postgres cluster can communicate with etcd
  • etcd crashes and recovers. after recovery, etcd leader TTLs have expired.
  • etcd crashes and recovers. after recovery, the initialization key is empty. this would cause issues when new members would come online and race to initialize

The first decision to answer is: should Postgres cluster go readonly if etcd fails? Or, should the Postgres cluster keep the current Primary, but not have automatic failover functionality?

Winslett avatar May 07 '15 14:05 Winslett

I think it is best to go readonly so replication from the new elected master can continue.

tvb avatar May 07 '15 14:05 tvb

In my opinion, i would prefer to keep the current Primary and do not take any "decision" when you loose the "brain". In general, i try to avoid the situation that the HA software/layer itself is able to bring down the protected application (and the protected application itself haven't any problems at all).

The Postgres cluster can go read-only as an additional safety measure, but it's not a must-have.

bjoernbessert avatar May 08 '15 17:05 bjoernbessert