spilo icon indicating copy to clipboard operation
spilo copied to clipboard

wal-g backup retention feature not working

Open andrewfung729 opened this issue 1 year ago • 2 comments

My finding

I set up wal-g on SSH to backup my patroni cluster. But I find the backup disk usage keep growing.

Environment

spilo version: 3.3-p1 wal verion in dockerfile (unchanged): ENV WALG_VERSION=v3.0.0 file: spilo/postgres-appliance/scripts/postgres_backup.sh (https://github.com/zalando/spilo/blob/3.3-p1/postgres-appliance/scripts/postgres_backup.sh) os: tested the backup script in container

root@db-2:/run# wal-g -v
wal-g version v3.0.0    4689e3a 2024.03.17_10:04:25     PostgreSQL

root@db-2:/run# uname -a
Linux db-2 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

root@db-2:/run# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
...

compose env

      USE_WALG_BACKUP: "true"
      USE_WALG_RESTORE: "true"
      WALG_BACKUP_FROM_REPLICA: "true"
      WALG_SSH_PREFIX: "${BACKUP_SSH_PREFIX}"
      SSH_PORT: "${BACKUP_SSH_PORT}"
      SSH_USERNAME: "${BACKUP_SSH_USER}"
      SSH_PRIVATE_KEY_PATH: "${BACKUP_SSH_KEY}"

My workaround

Edit the backup script sed regex

# from
done < <($WAL_E backup-list 2> /dev/null | sed '0,/^name\s*\(last_\)\?modified\s*/d')

# to
done < <($WAL_E backup-list 2> /dev/null | sed '0,/^backup_name\s*\(last_\)\?modified\s*/d')

Before modified

root@db-2:/home/postgres# wal-g backup-list
INFO: 2024/08/12 07:02:17.674772 List backups from storages: [default]
backup_name                   modified             wal_file_name            storage_name
base_0000003100000020000000DA 2024-08-01T01:01:26Z 0000003100000020000000DA default
base_00000031000000210000000A 2024-08-02T01:01:25Z 00000031000000210000000A default
...

After modified, the older backup is deleted

postgres@db-2:/scripts$ envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
INFO: 2024/08/12 07:26:01.442420        will be deleted: basebackups_005/base_00000031000000210000003A/tar_partitions/part_004.tar.lz4, from storage: default
...

root@db-2:/run# wal-g backup-list
INFO: 2024/08/12 07:51:07.034572 List backups from storages: [default]
backup_name                   modified             wal_file_name            storage_name
base_000000320000002200000061 2024-08-07T01:01:19Z 000000320000002200000061 default
base_0000003600000023000000DC 2024-08-08T01:01:27Z 0000003600000023000000DC default
...

andrewfung729 avatar Aug 12 '24 07:08 andrewfung729

Thank you very much for this information. I am experiencing the same problem with 3.3-p2. Any plans to include this fix on the next release?

j-q-in-berlin avatar Sep 16 '24 11:09 j-q-in-berlin

We hit this issue after upgrading to postgres-operator 1.13.0 (which includes spilo 3.3-p1). Any way to move this forward and get it merged?

bo0ts avatar Oct 21 '24 08:10 bo0ts