wal-g backup retention feature not working
My finding
I set up wal-g on SSH to backup my patroni cluster. But I find the backup disk usage keep growing.
Environment
spilo version: 3.3-p1 wal verion in dockerfile (unchanged): ENV WALG_VERSION=v3.0.0 file: spilo/postgres-appliance/scripts/postgres_backup.sh (https://github.com/zalando/spilo/blob/3.3-p1/postgres-appliance/scripts/postgres_backup.sh) os: tested the backup script in container
root@db-2:/run# wal-g -v
wal-g version v3.0.0 4689e3a 2024.03.17_10:04:25 PostgreSQL
root@db-2:/run# uname -a
Linux db-2 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@db-2:/run# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
...
compose env
USE_WALG_BACKUP: "true"
USE_WALG_RESTORE: "true"
WALG_BACKUP_FROM_REPLICA: "true"
WALG_SSH_PREFIX: "${BACKUP_SSH_PREFIX}"
SSH_PORT: "${BACKUP_SSH_PORT}"
SSH_USERNAME: "${BACKUP_SSH_USER}"
SSH_PRIVATE_KEY_PATH: "${BACKUP_SSH_KEY}"
My workaround
Edit the backup script sed regex
# from
done < <($WAL_E backup-list 2> /dev/null | sed '0,/^name\s*\(last_\)\?modified\s*/d')
# to
done < <($WAL_E backup-list 2> /dev/null | sed '0,/^backup_name\s*\(last_\)\?modified\s*/d')
Before modified
root@db-2:/home/postgres# wal-g backup-list
INFO: 2024/08/12 07:02:17.674772 List backups from storages: [default]
backup_name modified wal_file_name storage_name
base_0000003100000020000000DA 2024-08-01T01:01:26Z 0000003100000020000000DA default
base_00000031000000210000000A 2024-08-02T01:01:25Z 00000031000000210000000A default
...
After modified, the older backup is deleted
postgres@db-2:/scripts$ envdir "/run/etc/wal-e.d/env" /scripts/postgres_backup.sh "/home/postgres/pgdata/pgroot/data"
INFO: 2024/08/12 07:26:01.442420 will be deleted: basebackups_005/base_00000031000000210000003A/tar_partitions/part_004.tar.lz4, from storage: default
...
root@db-2:/run# wal-g backup-list
INFO: 2024/08/12 07:51:07.034572 List backups from storages: [default]
backup_name modified wal_file_name storage_name
base_000000320000002200000061 2024-08-07T01:01:19Z 000000320000002200000061 default
base_0000003600000023000000DC 2024-08-08T01:01:27Z 0000003600000023000000DC default
...
Thank you very much for this information. I am experiencing the same problem with 3.3-p2. Any plans to include this fix on the next release?
We hit this issue after upgrading to postgres-operator 1.13.0 (which includes spilo 3.3-p1). Any way to move this forward and get it merged?