postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Lifecycle management of wal archives in S3

Open paalkr opened this issue 4 years ago • 18 comments

What is the recommended way of keeping the number of basebackups and wal files in the S3 wal bucket at a reasonable level?

I have configured the BACKUP_NUM_TO_RETAIN env variable, but this does not seem to apply to S3 cleanup, or does it?

Should I just apply a bucket policy that will remove files older then the BACKUP_NUM_TO_RETAIN (as in days)?

paalkr avatar Apr 28 '20 22:04 paalkr

@paalkr we set this to the last 5 days to be able to restore a DB from last friday on a tuesday :smiley: But, Spilo should also make sure to always keep that number, no matter if it covers more days. Deletion is done by wale command. Have a look a this code snippet. Are you ending up with more or less backups than configured?

FxKu avatar May 13 '20 06:05 FxKu

Thanks for the feedback. I have specified BACKUP_NUM_TO_RETAIN=7, but our problem is that the wal files and basebackups archived to S3 are not cleaned up at all. So every wal file and basebackup from the time of the database was first created are stored in S3.

I have looked at that code you refer to, but I don't understand how that will delete files already copied to S3 for archiving? I mean, I can see how that code delete old files from the pod file system, but not how that is reflected in the S3 wal bucket.

I was under the impression that this command was responsible for archiving WAL file to S3. https://github.com/zalando/spilo/blob/9cca3408267c68b44f0c1d6515cc34730a1ac95d/postgres-appliance/scripts/configure_spilo.py#L241 which in turn resolves to this command https://github.com/zalando/spilo/blob/9cca3408267c68b44f0c1d6515cc34730a1ac95d/postgres-appliance/scripts/configure_spilo.py#L569 that executes wal-push in the wal-g package https://github.com/wal-g/wal-g/blob/1e0501cc07175ba7f53d4be0883c1b87bd699dc3/cmd/pg/wal_push.go

We have configured spilo to use wal-g and not wal-e, because wal-e does not support the eu-north-1 region (wal-g does). Wal-e never will support new regions either, because of the limitations in the old deprecated boto library used by wal-e.

apiVersion: v1
kind: ConfigMap
metadata:
  name: pod-env-overrides
  namespace: kube-postgres-operator
data:
  # Any env variable used by spilo can be added
  AWS_REGION: "eu-north-1" #This value is ONLY used for the wal-g S3 streaming
  USE_WALG_BACKUP: "true"
  BACKUP_NUM_TO_RETAIN: "7"
  AWS_INSTANCE_PROFILE: "true"
  USE_WALG_RESTORE: "true"

Maybe wal-e wal-push actually delete old files wal files from s3? https://github.com/wal-e/wal-e/blob/485d834a18c9b0d97115d95f89e16bdc564e9a18/wal_e/cmd.py#L657

paalkr avatar May 13 '20 09:05 paalkr

Another dimension to potentially consider would be not having WAL-E delete anything (no even after 5 days) but to apply an AWS S3 object lifecycle policy to do the housekeeping. This would disconnect the producer of data from the ability to delete it. Certainly it's not as "clean" or consistent to delete files after a certain age (the WAL files are not deleted in harmony with the last basebackup), but most of the time the little overhead of WAL-data that is there for a few days longer should not be an issue.

frittentheke avatar May 13 '20 13:05 frittentheke

Yes, this is our current "workaround", to apply a S3 lifecycle policy to delete objects older then 10 days.

paalkr avatar May 13 '20 15:05 paalkr

@paalkr How did you configure spilo to use wal-g and not wal-e ?

abdennour avatar Jul 11 '20 13:07 abdennour

@abdennour , I applied the configmap I specified in my previous comment https://github.com/zalando/postgres-operator/issues/947#issuecomment-627872902

The important keywords are USE_WALG_BACKUP and USE_WALG_RESTORE

paalkr avatar Jul 11 '20 20:07 paalkr

Thank you @paalkr . Where are the other parameters like which S3_BUCKET ?

abdennour avatar Jul 11 '20 20:07 abdennour

We have the same issues and workaround solution that @paalkr outlined. Very cluttered WAL logs throughout S3 (at one point we hit 400TB). We've set BACKUP_NUM_TO_RETAIN, but for many of the clusters it doesn't seem to work (despite all sharing the same configuration), and in all cases appears to be intermittent. We've had to reduce our S3 lifecycle to 15 days.

cazter avatar May 27 '21 18:05 cazter

I came by this issue because the old backup data was not deleted on my S3 storage either. I have tried to trace the backup process and come to the following conclusion:

  1. The backup script tries to keep as many base backups as specified in the configuration BACKUP_NUM_TO_RETAIN but not older as BACKUP_NUM_TO_RETAIN converted in days, no matter if the date of the base backups is older. See the script
  2. Too small order/limit values can cause the process for creating the base backup to be killed; which was the case for me - see also this related issue

Considering these 2 points, I never reached the number of base backups required to trigger the process to delete the older files, because within a month the process was killed multiple times and lead to missing base backups. My config BACKUP_NUM_TO_RETAIN=30 and BACKUP_SCHEDULE=00 01 * * * At this point the storage consumption increased, because the data which were older were never deleted.

The delete procedure would work correctly, provided that the backup script is always executed successfully. It would be ideal if a metric would be available to check if the base backup was successful or not.

ddegasperi avatar Oct 26 '21 15:10 ddegasperi

@ddegasperi regarding:

It would be ideal if a metric would be available to check if the base backup was successful or not.

Have you tried https://github.com/camptocamp/wal-g-prometheus-exporter ?

We found it over at https://github.com/wal-g/wal-g/issues/323#issuecomment-595663310, and have been giving it a try recently.

MPV avatar Dec 16 '21 14:12 MPV

@MPV I assume you have the wal-g-exporter configured as a sidecar, could you share your configuration? Especially the mapping of the environment variables would help :)

Currently my exporter can connect to postgres:

$ k logs app-db-1 -c wal-g-exporter
INFO:root:Startup...
INFO:root:My PID is: 10
INFO:root:Webserver started on port 9351
ERROR:root:Unable to connect postgres server, retrying in 60sec...
INFO:root:Updating basebackups metrics...
ERROR:root:Command '['wal-g', 'backup-list', '--detail', '--json']' returned non-zero exit status 1.
$ k logs app-db-0 -c wal-g-exporter
INFO:root:Startup...
INFO:root:My PID is: 7
INFO:root:Webserver started on port 9351
ERROR:root:Unable to connect postgres server, retrying in 60sec...
INFO:root:Running on slave, waiting for promotion...
INFO:root:Running on slave, waiting for promotion...

but can't fetch the backup-list :(

$ k exec -it app-db-1 -n app -c wal-g-exporter -- bash
I have no name!@app-db-1:/$ wal-g backup-list
ERROR: 2021/12/29 11:34:03.202347 user: unknown userid 101

My running operatorconfigurations.acid.zalan.do currently looks like this for configuration.sidecars.

  - args:
    - --debug
    - /home/postgres/pgdata/pgroot/data/pg_wal/archive_status
    env:
    - name: PGUSER
      value: $(POSTGRES_USER)
    - name: PGPASSWORD
      value: $(POSTGRES_PASSWORD)
    - name: PGHOST
      value: localhost
    - name: PGPORT
      value: "5432"
    - name: PGSSLMODE
      value: disable
    - name: WALE_S3_PREFIX
      value: TODO
    envFrom:
    - secretRef:
        name: postgres-operator-pod-environment
    image: camptocamp/wal-g-prometheus-exporter:latest
    name: wal-g-exporter
    ports:
    - containerPort: 9351
      name: wal-g-metrics
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 64Mi

movd avatar Dec 29 '21 10:12 movd

Thanks @MPV for pointing out the wal-g-prometheus-exporter and with the code snippets from @movd I was able to get the exporter running.

The problem I still have now to use the exporter in production are:

  1. When I define the exporter as a global sidecar via operatorconfigurations.acid.zalan.do, I am missing some information to auto-generate the WALE_S3_PREFIX

Example:

    env:
    - name: WALE_S3_PREFIX
      value: s3://my-bucket-url/spilo/$(POD_NAMESPACE)-$(???)/wal/$(???)

The first $(???) should contain the name of the postgresql cluster. At this moment each sidecar obtains automatically the POD_NAME, but this variable contains also the instance number of the cluster as suffix. It would be great if the SCOPE variable would be injected in each sidecar container too.

The second $(???) should contain the version of the postgresql cluster. Also in this case it would be great if the variable PGVERSION would be injected in each sidecar container.

  1. If I define the export as a cluster sidecar, the auto-injection of SCOPE and PGVERSION is unnecessary, since I have this information specifically. The problem in this case is related to the use of the exporter container, which expects args, to be executed properly. Only a subset of the container properties seems supported. I could of course derive my own container image from the wal-g-prometheus-export image to explicitly pass the args there, but this way doesn't seem very smart to me.

In general, I think it would be great if both solutions could work. I would be happy to create possible feature requests, depending on what corresponds to the project guidelines. Please let me know.


@movd Initially I had the same problem with fetching the backup-list from S3. I've connected to the running container and run the command manually and got some infos about missing variables. I then explicitly defined AWS_ENDPOINT, AWS_REGION and AWS_S3_FORCE_PATH_STYLE. I hope this can help you

ddegasperi avatar Dec 30 '21 10:12 ddegasperi

I tried it like this: Since the computed WALE_S3_PREFIX is stored in /run/etc/wal-e.d, we share this path between the postgres and the wal-g-exporter container.

On the level of the specific cluster set:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: app-db
  namespace: some-namespace
spec:
  additionalVolumes:
  - name: wal-e-directory
    mountPath: /run/etc/wal-e.d
    targetContainers:
      - postgres
      - wal-g-exporter

In the global operatorconfigurations.acid.zalan.do under configuration.sidecars

      - name: wal-g-exporter
        image: camptocamp/wal-g-prometheus-exporter:latest
        args:
          - "--debug" 
          - "/home/postgres/pgdata/pgroot/data/pg_wal/archive_status"  # archive_dir argument
        ports:
        - name: wal-g-metrics
          containerPort: 9351
          protocol: TCP
        resources:
          limits:
            cpu: '1'
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 64Mi
        env:
          - name: "PGUSER"
            value: "$(POSTGRES_USER)"
          - name: "PGPASSWORD"
            value: "$(POSTGRES_PASSWORD)"
          - name: "PGHOST"
            value: "localhost"
          - name: "PGPORT"
            value: "5432"
          - name: "PGSSLMODE"
            value: "disable"
          - name: WALE_S3_PREFIX
            value: "$(cat /run/etc/wal-e.d/env/WALG_S3_PREFIX)"

After that, the information is in the wal-g-exporter container. Unfortunately, this is as far as I got, maybe that will help you.

I still get the "unknown userid 101" error from wal-g (from my previous post). How did you get past this error?

movd avatar Jan 03 '22 15:01 movd

Found this too while wondering why our (ceph based) S3 bucket is so full… apparently this option is ignored.

toabi avatar Jan 19 '22 15:01 toabi

I've found that the cause of the issue for us was that the sed command here was causing the backup pruning to not work due to a change in the output in newer versions of wal-g. This was fixed in this commit and this fix is present in image registry.opensource.zalan.do/acid/spilo-14:2.1-p6.

davidspek avatar Sep 21 '22 10:09 davidspek

I still get the "unknown userid 101" error from wal-g (from my previous post). How did you get past this error? @movd FYI from https://github.com/wal-g/wal-g/issues/300 you need to pass a --config argument or create an entry in /etc/passwd for the user. Hope this helps!

akloss-cibo avatar Feb 02 '23 19:02 akloss-cibo

I found that the wal_005/*.history.lz4 files aren't automatically deleted but apart from that the other files are deleted as expected. I am using the registry.opensource.zalan.do/acid/spilo-14:2.1-p6 image

Charles-Johnson avatar Aug 18 '23 12:08 Charles-Johnson

Is there any update about the solution or configuration for this issue, guy? Would you be able to clean up the wall archive on s3? The workaround applying an S3 lifecycle policy to delete objects older than x days is too risky for me. :(

laiminhtrung1997 avatar Feb 04 '24 09:02 laiminhtrung1997

Possibly wal-g delete garbage could be used to clean up old WAL archives?

Deletes outdated WAL archives and backups leftover files from storage, e.g. unsuccessfully backups or partially deleted ones. Will remove all non-permanent objects before the earliest non-permanent backup. This command is useful when backups are being deleted by the delete target command.

Specifically:

useful when backups are being deleted by the delete target command

Which appears to be what is happening as part of the cleanup.

carlreid avatar Mar 13 '24 14:03 carlreid

Is there any update about the solution or configuration for this issue, guy? Would you be able to clean up the wall archive on s3? The workaround applying an S3 lifecycle policy to delete objects older than x days is too risky for me. :(

S3 lifecycle rules are the way to go. We started using them and I'm convinced that this is not the operator's job to clean up.

FxKu avatar Apr 24 '24 07:04 FxKu

not the operator's job to clean up.

Why then even bother having BACKUP_NUM_TO_RETAIN then? Should the operator not cleanup old backups using this reasoning?

Could the wal-g delete garbage command be used as I previously suggested in place of where the current $USE_WALG_BACKUP" == "true" just after $WAL_E delete before FIND_FULL "$BEFORE" --confirm for those that make use of WAL-G?

carlreid avatar Apr 24 '24 07:04 carlreid