spilo icon indicating copy to clipboard operation
spilo copied to clipboard

spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk?

Open joar opened this issue 7 years ago • 0 comments

Scenario

  • GKE Kubernetes
  • spilo Pods via StatefulSet: patroni-set-0003
    kind: StatefulSet
    # [...]
    metadata:
      name: patroni-set-0003
    spec:
      replicas: 3
      # [...]
      template:
        spec:
          containers:
            - name: spilo
              # [...]
              env:
                - name: SCOPE
                  value: the-scope
              volumeMounts:
                - mountPath: /home/postgres/pgdata
                  name: pgdata
      volumeClaimTemplates:
        - metadata:
            name: pgdata
    
          spec:
            # [...]
            resources.requests.storage: 500Gi
    

Unfortunately, /home/postgres/pgdata ran out of space (in all pods, it seems, probably almost simultaneously) and spilo/patroni started logging:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/patroni/async_executor.py", line 39, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 1067, in _do_follow
    self.write_recovery_conf(primary_conninfo)
  File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 911, in write_recovery_conf
    f.write("{0} = '{1}'\n".format(name, value))
OSError: [Errno 28] No space left on device

I believe the last leader before all pods went out of disk was either patroni-set-0003-1 or patroni-set-0003-2.

Recovery

In order to solve the issue I:

  1. Scaled down patroni-set-0003 to 1 replica (still failing with OSError: No space left on device), Note that this will leave me without any running old leader, broken or not, I believe this could be a key to my issue.
  2. Created a new StatefulSet, patroni-set-0004, with the same configuration as patroni-set-0003 except
    spec.metadata.name: patroni-set-0004
    spec.replicas: 1
    spec.volumeClaimTemplates[0].spec.resources.requests: 1Ti
    

With only the broken patroni-set-0003-0 running, patroni-set-0004-0 started restoring from WAL archive, I left it overnight to restore. During this time both patroni-set-0003-0 and patroni-set-0004-0 were running, but patroni-set-0003-0` was out of disk.

Several hours later, patroni-set-0004-0 was logging lots of:

following a different leader because i am not the healthiest node
Lock owner: None; I am patroni-set-0004-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...] 

I expected patroni-set-0004-0 to take over the master lock by this time.


Debugging why the disk outage occured, I found out about ext Reserved blocks, I then recovered 25Gi of disk space on patroni-set-0003-0's pgdata by running tune2fs -m 0 /dev/$PGDATA_DEV. I realize in hindsight that simply resizing the GCE PD would have been easier.

However, once patroni-set-0003-0 was given extra space and restarted, it did not seem willing to take the leader role even given the extra disk space and no current leader, logging lots of:

Lock owner: None; I am patroni-set-0003-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...] 

I expected patroni-set-0003-0 to take the leader role by this time.


I then did the same thing to patroni-set-0003-{1,2}, freeing up 25Gi of space.

Once patroni-set-0003-1 was given extra disk space and restarted it took the master lock.

joar avatar Jun 27 '17 12:06 joar