fix: wrong regex for wal retention
Fixes #1015
Credits to @andrewfung729 for providing the fix.
Created a PR out of this because I also encountered this issue and needed a fix relatively quickly.
For the maintainers, could you give an estimate on when spilo gets released and when the postgres operator is also bumped? If this could take a while, I may create my own monkey patch until this is released.
Thanks!
I also was wondering why the script does not use following command: wal-g delete retain FULL $DAYS_TO_RETAIN
This is probably an easier way instead of calculating the $BEFORE variable
Isn't this going to break wal-e because it still prints name in version 1.1.1?
@bo0ts @emrah83 I updated my regex to be backwards compatible with wal-e. The regex script already has been updated, I think, given that you are referring to this script.
any chance this would get approved and merged, we're facing also the same issue and tested that this fixes the issue
If this is already touched, can we discuss if there is an option to have BACKUP_NUM_TO_RETAIN really be the number of backups and not the DAYS_TO_RETAIN this is confusing if I have multiple backups a day and then there are still more backups available then set by me via BACKUP_NUM_TO_RETAIN
@Yingrjimsch I'd advise you to create a separate issue for this :)
I'd like to see this merged asap, because I think it will take some time to get the version of spilo bumped on the operator, might want to use a custom build for the time being.
Hi, would it be possible for either of you to take a look at this PR @hughcapet @FxKu :smiley: ?
Great. I think we now have to wait for the next tag or release for the new image. Right?
For anyone who needs a temporary fix. I have written a small script which loops over the databases in your cluster and edits the crontab/postgres file so the wal retention regex is correct.
In my case all databases are identifiable by the substring database, feel free to change that for your usecase 😄
⚠️ This TEMPORARY fix is removed as soon as the db pod restarts.
#!/bin/bash
# Find all pods containing the substring "database" with the label spilo-role=master
pods=$(kubectl get pods --all-namespaces -l spilo-role=master -o jsonpath='{range .items[?(@.metadata.name contains "database")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}')
# Check if any pods were found
if [ -z "$pods" ]; then
echo -e "${RED}No matching pods found.${NC}"
exit 1
fi
# Loop through each found pod
echo "$pods" | while read -r namespace pod; do
# Adding some spacing between pods
echo -e "\n\n==========================="
echo "Processing pod: $pod in namespace: $namespace"
echo "==========================="
# Execute pg_dump inside the pod and store the output in a file inside the pod's /tmp directory
echo "Fixing postgres_backup for database: $db_name in pod: $pod..."
kubectl --kubeconfig="$KUBECONFIG_PATH" exec -n "$namespace" "$pod" -- bash -c "sed -i 's|/scripts/postgres_backup.sh \"|backup -s \"|' /var/spool/cron/crontabs/postgres && sed -i 's|envdir|cat /scripts/postgres_backup.sh \| sed '\''s/\\^name/\\^\\\\(backup_\\\\)\\\\?name/g'\'' \| envdir|' /var/spool/cron/crontabs/postgres"
done
any updates on this PR to get it merged that fixes the current branch? 👍
Next week, I hope. End of november we always have to follow a stricter merge and deployment policy.
👍
👍