spilo fix: wrong regex for wal retention

Fixes #1015

Credits to @andrewfung729 for providing the fix.

Created a PR out of this because I also encountered this issue and needed a fix relatively quickly.

For the maintainers, could you give an estimate on when spilo gets released and when the postgres operator is also bumped? If this could take a while, I may create my own monkey patch until this is released.

Thanks!

Sep 16 '24 22:09 nickmansrob

I also was wondering why the script does not use following command: wal-g delete retain FULL $DAYS_TO_RETAIN This is probably an easier way instead of calculating the $BEFORE variable

Sep 16 '24 23:09 nickmansrob

Isn't this going to break wal-e because it still prints name in version 1.1.1?

Oct 21 '24 12:10 bo0ts

@bo0ts @emrah83 I updated my regex to be backwards compatible with wal-e. The regex script already has been updated, I think, given that you are referring to this script.

Oct 26 '24 22:10 nickmansrob

any chance this would get approved and merged, we're facing also the same issue and tested that this fixes the issue

Nov 09 '24 19:11 fgalind1

If this is already touched, can we discuss if there is an option to have BACKUP_NUM_TO_RETAIN really be the number of backups and not the DAYS_TO_RETAIN this is confusing if I have multiple backups a day and then there are still more backups available then set by me via BACKUP_NUM_TO_RETAIN

Nov 11 '24 11:11 Yingrjimsch

@Yingrjimsch I'd advise you to create a separate issue for this :)

Nov 14 '24 18:11 nickmansrob

I'd like to see this merged asap, because I think it will take some time to get the version of spilo bumped on the operator, might want to use a custom build for the time being.

Nov 14 '24 18:11 nickmansrob

Hi, would it be possible for either of you to take a look at this PR @hughcapet @FxKu :smiley: ?

Nov 19 '24 07:11 OlleLarsson

Great. I think we now have to wait for the next tag or release for the new image. Right?

Nov 20 '24 18:11 emrahbecer

For anyone who needs a temporary fix. I have written a small script which loops over the databases in your cluster and edits the crontab/postgres file so the wal retention regex is correct.

In my case all databases are identifiable by the substring database, feel free to change that for your usecase 😄

⚠️ This TEMPORARY fix is removed as soon as the db pod restarts.

#!/bin/bash

# Find all pods containing the substring "database" with the label spilo-role=master
pods=$(kubectl get pods --all-namespaces -l spilo-role=master -o jsonpath='{range .items[?(@.metadata.name contains "database")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}')

# Check if any pods were found
if [ -z "$pods" ]; then
  echo -e "${RED}No matching pods found.${NC}"
  exit 1
fi

# Loop through each found pod
echo "$pods" | while read -r namespace pod; do

  # Adding some spacing between pods
  echo -e "\n\n==========================="
  echo "Processing pod: $pod in namespace: $namespace"
  echo "==========================="

  # Execute pg_dump inside the pod and store the output in a file inside the pod's /tmp directory
  echo "Fixing postgres_backup for database: $db_name in pod: $pod..."
  kubectl --kubeconfig="$KUBECONFIG_PATH" exec -n "$namespace" "$pod" -- bash -c "sed -i 's|/scripts/postgres_backup.sh \"|backup -s \"|' /var/spool/cron/crontabs/postgres && sed -i 's|envdir|cat /scripts/postgres_backup.sh \| sed '\''s/\\^name/\\^\\\\(backup_\\\\)\\\\?name/g'\'' \| envdir|' /var/spool/cron/crontabs/postgres"
done

Nov 21 '24 20:11 Yingrjimsch

any updates on this PR to get it merged that fixes the current branch? 👍

Nov 26 '24 23:11 fgalind1

Next week, I hope. End of november we always have to follow a stricter merge and deployment policy.

Nov 27 '24 20:11 FxKu

👍

Dec 06 '24 13:12 hughcapet

👍

Dec 06 '24 13:12 idanovinda