spilo
spilo copied to clipboard
spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk?
Scenario
- GKE Kubernetes
- spilo Pods via StatefulSet:
patroni-set-0003
kind: StatefulSet # [...] metadata: name: patroni-set-0003 spec: replicas: 3 # [...] template: spec: containers: - name: spilo # [...] env: - name: SCOPE value: the-scope volumeMounts: - mountPath: /home/postgres/pgdata name: pgdata volumeClaimTemplates: - metadata: name: pgdata spec: # [...] resources.requests.storage: 500Gi
Unfortunately, /home/postgres/pgdata
ran out of space (in all pods, it seems,
probably almost simultaneously) and spilo/patroni started logging:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/patroni/async_executor.py", line 39, in run
wakeup = func(*args) if args else func()
File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 1067, in _do_follow
self.write_recovery_conf(primary_conninfo)
File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 911, in write_recovery_conf
f.write("{0} = '{1}'\n".format(name, value))
OSError: [Errno 28] No space left on device
I believe the last leader before all pods went out of disk was
either patroni-set-0003-1
or patroni-set-0003-2
.
Recovery
In order to solve the issue I:
- Scaled down
patroni-set-0003
to 1 replica (still failing with OSError: No space left on device), Note that this will leave me without any running old leader, broken or not, I believe this could be a key to my issue. - Created a new StatefulSet,
patroni-set-0004
, with the same configuration aspatroni-set-0003
exceptspec.metadata.name: patroni-set-0004 spec.replicas: 1 spec.volumeClaimTemplates[0].spec.resources.requests: 1Ti
With only the broken patroni-set-0003-0
running, patroni-set-0004-0
started
restoring from WAL archive, I left it overnight to restore. During this time
both patroni-set-0003-0
and patroni-set-0004-0 were running, but
patroni-set-0003-0` was out of disk.
Several hours later, patroni-set-0004-0
was logging lots of:
following a different leader because i am not the healthiest node
Lock owner: None; I am patroni-set-0004-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...]
I expected patroni-set-0004-0
to take over the master lock by this time.
Debugging why the disk outage occured, I found out about ext Reserved
blocks, I then
recovered 25Gi of disk space on patroni-set-0003-0
's pgdata
by running
tune2fs -m 0 /dev/$PGDATA_DEV
. I realize in hindsight that simply resizing
the GCE PD would have been easier.
However, once patroni-set-0003-0
was given extra space and restarted, it did
not seem willing to take the leader role even given the extra disk space and no
current leader, logging lots of:
Lock owner: None; I am patroni-set-0003-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...]
I expected patroni-set-0003-0
to take the leader role by this time.
I then did the same thing to patroni-set-0003-{1,2}
, freeing up 25Gi of space.
Once patroni-set-0003-1
was given extra disk space and restarted it took the
master lock.