postgres-operator
postgres-operator copied to clipboard
Lifecycle management of wal archives in S3
What is the recommended way of keeping the number of basebackups and wal files in the S3 wal bucket at a reasonable level?
I have configured the BACKUP_NUM_TO_RETAIN
env variable, but this does not seem to apply to S3 cleanup, or does it?
Should I just apply a bucket policy that will remove files older then the BACKUP_NUM_TO_RETAIN
(as in days)?
@paalkr we set this to the last 5 days to be able to restore a DB from last friday on a tuesday :smiley: But, Spilo should also make sure to always keep that number, no matter if it covers more days. Deletion is done by wale command. Have a look a this code snippet. Are you ending up with more or less backups than configured?
Thanks for the feedback. I have specified BACKUP_NUM_TO_RETAIN=7, but our problem is that the wal files and basebackups archived to S3 are not cleaned up at all. So every wal file and basebackup from the time of the database was first created are stored in S3.
I have looked at that code you refer to, but I don't understand how that will delete files already copied to S3 for archiving? I mean, I can see how that code delete old files from the pod file system, but not how that is reflected in the S3 wal bucket.
I was under the impression that this command was responsible for archiving WAL file to S3. https://github.com/zalando/spilo/blob/9cca3408267c68b44f0c1d6515cc34730a1ac95d/postgres-appliance/scripts/configure_spilo.py#L241 which in turn resolves to this command https://github.com/zalando/spilo/blob/9cca3408267c68b44f0c1d6515cc34730a1ac95d/postgres-appliance/scripts/configure_spilo.py#L569 that executes wal-push in the wal-g package https://github.com/wal-g/wal-g/blob/1e0501cc07175ba7f53d4be0883c1b87bd699dc3/cmd/pg/wal_push.go
We have configured spilo to use wal-g and not wal-e, because wal-e does not support the eu-north-1 region (wal-g does). Wal-e never will support new regions either, because of the limitations in the old deprecated boto library used by wal-e.
apiVersion: v1
kind: ConfigMap
metadata:
name: pod-env-overrides
namespace: kube-postgres-operator
data:
# Any env variable used by spilo can be added
AWS_REGION: "eu-north-1" #This value is ONLY used for the wal-g S3 streaming
USE_WALG_BACKUP: "true"
BACKUP_NUM_TO_RETAIN: "7"
AWS_INSTANCE_PROFILE: "true"
USE_WALG_RESTORE: "true"
Maybe wal-e wal-push actually delete old files wal files from s3? https://github.com/wal-e/wal-e/blob/485d834a18c9b0d97115d95f89e16bdc564e9a18/wal_e/cmd.py#L657
Another dimension to potentially consider would be not having WAL-E delete anything (no even after 5 days) but to apply an AWS S3 object lifecycle policy to do the housekeeping. This would disconnect the producer of data from the ability to delete it. Certainly it's not as "clean" or consistent to delete files after a certain age (the WAL files are not deleted in harmony with the last basebackup), but most of the time the little overhead of WAL-data that is there for a few days longer should not be an issue.
Yes, this is our current "workaround", to apply a S3 lifecycle policy to delete objects older then 10 days.
@paalkr How did you configure spilo to use wal-g and not wal-e ?
@abdennour , I applied the configmap I specified in my previous comment https://github.com/zalando/postgres-operator/issues/947#issuecomment-627872902
The important keywords are USE_WALG_BACKUP
and USE_WALG_RESTORE
Thank you @paalkr . Where are the other parameters like which S3_BUCKET ?
We have the same issues and workaround solution that @paalkr outlined. Very cluttered WAL logs throughout S3 (at one point we hit 400TB). We've set BACKUP_NUM_TO_RETAIN
, but for many of the clusters it doesn't seem to work (despite all sharing the same configuration), and in all cases appears to be intermittent. We've had to reduce our S3 lifecycle to 15 days.
I came by this issue because the old backup data was not deleted on my S3 storage either. I have tried to trace the backup process and come to the following conclusion:
- The backup script tries to keep as many base backups as specified in the configuration
BACKUP_NUM_TO_RETAIN
but not older asBACKUP_NUM_TO_RETAIN
converted in days, no matter if the date of the base backups is older. See the script - Too small order/limit values can cause the process for creating the base backup to be killed; which was the case for me - see also this related issue
Considering these 2 points, I never reached the number of base backups required to trigger the process to delete the older files, because within a month the process was killed multiple times and lead to missing base backups.
My config BACKUP_NUM_TO_RETAIN=30
and BACKUP_SCHEDULE=00 01 * * *
At this point the storage consumption increased, because the data which were older were never deleted.
The delete procedure would work correctly, provided that the backup script is always executed successfully. It would be ideal if a metric would be available to check if the base backup was successful or not.
@ddegasperi regarding:
It would be ideal if a metric would be available to check if the base backup was successful or not.
Have you tried https://github.com/camptocamp/wal-g-prometheus-exporter ?
We found it over at https://github.com/wal-g/wal-g/issues/323#issuecomment-595663310, and have been giving it a try recently.
@MPV I assume you have the wal-g-exporter configured as a sidecar, could you share your configuration? Especially the mapping of the environment variables would help :)
Currently my exporter can connect to postgres:
$ k logs app-db-1 -c wal-g-exporter
INFO:root:Startup...
INFO:root:My PID is: 10
INFO:root:Webserver started on port 9351
ERROR:root:Unable to connect postgres server, retrying in 60sec...
INFO:root:Updating basebackups metrics...
ERROR:root:Command '['wal-g', 'backup-list', '--detail', '--json']' returned non-zero exit status 1.
$ k logs app-db-0 -c wal-g-exporter
INFO:root:Startup...
INFO:root:My PID is: 7
INFO:root:Webserver started on port 9351
ERROR:root:Unable to connect postgres server, retrying in 60sec...
INFO:root:Running on slave, waiting for promotion...
INFO:root:Running on slave, waiting for promotion...
but can't fetch the backup-list :(
$ k exec -it app-db-1 -n app -c wal-g-exporter -- bash
I have no name!@app-db-1:/$ wal-g backup-list
ERROR: 2021/12/29 11:34:03.202347 user: unknown userid 101
My running operatorconfigurations.acid.zalan.do
currently looks like this for configuration.sidecars
.
- args:
- --debug
- /home/postgres/pgdata/pgroot/data/pg_wal/archive_status
env:
- name: PGUSER
value: $(POSTGRES_USER)
- name: PGPASSWORD
value: $(POSTGRES_PASSWORD)
- name: PGHOST
value: localhost
- name: PGPORT
value: "5432"
- name: PGSSLMODE
value: disable
- name: WALE_S3_PREFIX
value: TODO
envFrom:
- secretRef:
name: postgres-operator-pod-environment
image: camptocamp/wal-g-prometheus-exporter:latest
name: wal-g-exporter
ports:
- containerPort: 9351
name: wal-g-metrics
protocol: TCP
resources:
limits:
cpu: "1"
memory: 256Mi
requests:
cpu: 100m
memory: 64Mi
Thanks @MPV for pointing out the wal-g-prometheus-exporter and with the code snippets from @movd I was able to get the exporter running.
The problem I still have now to use the exporter in production are:
- When I define the exporter as a global sidecar via
operatorconfigurations.acid.zalan.do
, I am missing some information to auto-generate theWALE_S3_PREFIX
Example:
env:
- name: WALE_S3_PREFIX
value: s3://my-bucket-url/spilo/$(POD_NAMESPACE)-$(???)/wal/$(???)
The first $(???)
should contain the name of the postgresql cluster. At this moment each sidecar obtains automatically the POD_NAME
, but this variable contains also the instance number of the cluster as suffix. It would be great if the SCOPE
variable would be injected in each sidecar container too.
The second $(???)
should contain the version of the postgresql cluster. Also in this case it would be great if the variable PGVERSION
would be injected in each sidecar container.
- If I define the export as a cluster sidecar, the auto-injection of
SCOPE
andPGVERSION
is unnecessary, since I have this information specifically. The problem in this case is related to the use of the exporter container, which expectsargs
, to be executed properly. Only a subset of the container properties seems supported. I could of course derive my own container image from the wal-g-prometheus-export image to explicitly pass theargs
there, but this way doesn't seem very smart to me.
In general, I think it would be great if both solutions could work. I would be happy to create possible feature requests, depending on what corresponds to the project guidelines. Please let me know.
@movd Initially I had the same problem with fetching the backup-list from S3. I've connected to the running container and run the command manually and got some infos about missing variables. I then explicitly defined AWS_ENDPOINT
, AWS_REGION
and AWS_S3_FORCE_PATH_STYLE
. I hope this can help you
I tried it like this: Since the computed WALE_S3_PREFIX
is stored in /run/etc/wal-e.d
, we share this path between the postgres and the wal-g-exporter container.
On the level of the specific cluster set:
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: app-db
namespace: some-namespace
spec:
additionalVolumes:
- name: wal-e-directory
mountPath: /run/etc/wal-e.d
targetContainers:
- postgres
- wal-g-exporter
In the global operatorconfigurations.acid.zalan.do
under configuration.sidecars
- name: wal-g-exporter
image: camptocamp/wal-g-prometheus-exporter:latest
args:
- "--debug"
- "/home/postgres/pgdata/pgroot/data/pg_wal/archive_status" # archive_dir argument
ports:
- name: wal-g-metrics
containerPort: 9351
protocol: TCP
resources:
limits:
cpu: '1'
memory: 256Mi
requests:
cpu: 100m
memory: 64Mi
env:
- name: "PGUSER"
value: "$(POSTGRES_USER)"
- name: "PGPASSWORD"
value: "$(POSTGRES_PASSWORD)"
- name: "PGHOST"
value: "localhost"
- name: "PGPORT"
value: "5432"
- name: "PGSSLMODE"
value: "disable"
- name: WALE_S3_PREFIX
value: "$(cat /run/etc/wal-e.d/env/WALG_S3_PREFIX)"
After that, the information is in the wal-g-exporter container. Unfortunately, this is as far as I got, maybe that will help you.
I still get the "unknown userid 101" error from wal-g (from my previous post). How did you get past this error?
Found this too while wondering why our (ceph based) S3 bucket is so full… apparently this option is ignored.
I've found that the cause of the issue for us was that the sed
command here was causing the backup pruning to not work due to a change in the output in newer versions of wal-g
. This was fixed in this commit and this fix is present in image registry.opensource.zalan.do/acid/spilo-14:2.1-p6
.
I still get the "unknown userid 101" error from wal-g (from my previous post). How did you get past this error? @movd FYI from https://github.com/wal-g/wal-g/issues/300 you need to pass a
--config
argument or create an entry in/etc/passwd
for the user. Hope this helps!
I found that the wal_005/*.history.lz4
files aren't automatically deleted but apart from that the other files are deleted as expected. I am using the registry.opensource.zalan.do/acid/spilo-14:2.1-p6
image
Is there any update about the solution or configuration for this issue, guy? Would you be able to clean up the wall archive on s3? The workaround applying an S3 lifecycle policy to delete objects older than x days is too risky for me. :(
Possibly wal-g delete garbage
could be used to clean up old WAL archives?
Deletes outdated WAL archives and backups leftover files from storage, e.g. unsuccessfully backups or partially deleted ones. Will remove all non-permanent objects before the earliest non-permanent backup. This command is useful when backups are being deleted by the delete target command.
Specifically:
useful when backups are being deleted by the delete target command
Which appears to be what is happening as part of the cleanup.
Is there any update about the solution or configuration for this issue, guy? Would you be able to clean up the wall archive on s3? The workaround applying an S3 lifecycle policy to delete objects older than x days is too risky for me. :(
S3 lifecycle rules are the way to go. We started using them and I'm convinced that this is not the operator's job to clean up.
not the operator's job to clean up.
Why then even bother having BACKUP_NUM_TO_RETAIN
then? Should the operator not cleanup old backups using this reasoning?
Could the wal-g delete garbage
command be used as I previously suggested in place of where the current $USE_WALG_BACKUP" == "true"
just after $WAL_E delete before FIND_FULL "$BEFORE" --confirm
for those that make use of WAL-G?