postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

pgbackrest backup fails with ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout

Open ckazimie opened this issue 4 months ago • 2 comments

Overview

Backup job ends in an error ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 1.21.14
  • PGO Image Tag: crunchy-pgbackrest:ubi8-2.40-1
  • Postgres Version 14
  • Storage: nfs

Steps to Reproduce

REPRO

Provide steps to get to the error condition: Define postgres cluster with the following section

  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time
      configuration:
      - configMap:
          name: archive-conf
      repos:
      - name: repo1
        schedules:
          full: "0 2 * * 0"
          differential: "0 2 * * 1-6"

and archive-conf configmap looks like

apiVersion: v1
data:
  archive.conf: |-
    [archive]
    archive-timeout=180
kind: ConfigMap
metadata:
  name: archive-conf
  namespace: postgres

There will be created 2 cronjobs to do the backups

EXPECTED

Backup ends with success

ACTUAL

Error 082 from pgbackrest

Logs

time="2024-04-17T08:59:48Z" level=info msg="crunchy-pgbackrest starts"
time="2024-04-17T08:59:48Z" level=info msg="debug flag set to false"
time="2024-04-17T08:59:48Z" level=info msg="backrest backup command requested"
time="2024-04-17T08:59:48Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1 --type=full]"
time="2024-04-17T09:01:50Z" level=info msg="output=[]"
time="2024-04-17T09:01:50Z" level=info msg="stderr=[ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout\n       HINT: check the archive_command to ensure that all options are correct (especially --stanza).\n       HINT: check the PostgreSQL server log for errors.\n       HINT: run the 'start' command if the stanza was previously stopped.\n]"
time="2024-04-17T09:01:50Z" level=fatal msg="command terminated with exit code 82"

Additional Information

I tried to increase --archive-timeout , e.i. by adding [archive] section for via archive-conf map, by setting it to 180s - see the cluster snippet, however, the error still says the timeout is the default 60s.

I wonder how to increase the timeout or get rid of the error otherwise?

ckazimie avatar Apr 17 '24 09:04 ckazimie

I have a similar issue.

gil0109 avatar Apr 18 '24 15:04 gil0109

Hi @ckazimie Sorry you are running into issues. If you set your archive timeout like this it will be updated to 180 from 60

 backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time
        archive-timeout: "180"
      repos:
      - name: repo1
        schedules:
          full: "0 2 * * 0"
          differential: "0 2 * * 1-6"

Also another good place to check is the postgres logs if you continue to see this error

ValClarkson avatar Apr 30 '24 20:04 ValClarkson