postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

pgbackrest backup fails with ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout

Open ckazimie opened this issue 2 years ago • 2 comments

Overview

Backup job ends in an error ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 1.21.14
  • PGO Image Tag: crunchy-pgbackrest:ubi8-2.40-1
  • Postgres Version 14
  • Storage: nfs

Steps to Reproduce

REPRO

Provide steps to get to the error condition: Define postgres cluster with the following section

  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time
      configuration:
      - configMap:
          name: archive-conf
      repos:
      - name: repo1
        schedules:
          full: "0 2 * * 0"
          differential: "0 2 * * 1-6"

and archive-conf configmap looks like

apiVersion: v1
data:
  archive.conf: |-
    [archive]
    archive-timeout=180
kind: ConfigMap
metadata:
  name: archive-conf
  namespace: postgres

There will be created 2 cronjobs to do the backups

EXPECTED

Backup ends with success

ACTUAL

Error 082 from pgbackrest

Logs

time="2024-04-17T08:59:48Z" level=info msg="crunchy-pgbackrest starts"
time="2024-04-17T08:59:48Z" level=info msg="debug flag set to false"
time="2024-04-17T08:59:48Z" level=info msg="backrest backup command requested"
time="2024-04-17T08:59:48Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1 --type=full]"
time="2024-04-17T09:01:50Z" level=info msg="output=[]"
time="2024-04-17T09:01:50Z" level=info msg="stderr=[ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout\n       HINT: check the archive_command to ensure that all options are correct (especially --stanza).\n       HINT: check the PostgreSQL server log for errors.\n       HINT: run the 'start' command if the stanza was previously stopped.\n]"
time="2024-04-17T09:01:50Z" level=fatal msg="command terminated with exit code 82"

Additional Information

I tried to increase --archive-timeout , e.i. by adding [archive] section for via archive-conf map, by setting it to 180s - see the cluster snippet, however, the error still says the timeout is the default 60s.

I wonder how to increase the timeout or get rid of the error otherwise?

ckazimie avatar Apr 17 '24 09:04 ckazimie

I have a similar issue.

gil0109 avatar Apr 18 '24 15:04 gil0109

Hi @ckazimie Sorry you are running into issues. If you set your archive timeout like this it will be updated to 180 from 60

 backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time
        archive-timeout: "180"
      repos:
      - name: repo1
        schedules:
          full: "0 2 * * 0"
          differential: "0 2 * * 1-6"

Also another good place to check is the postgres logs if you continue to see this error

ValClarkson avatar Apr 30 '24 20:04 ValClarkson

Hello, I believe the above answer should help you with this issue, so I'm closing this issue. If you're still experiencing this issue, please reopen this and we can continue to try to fix the problem.

benjaminjb avatar Jun 13 '24 17:06 benjaminjb