pg_probackup
pg_probackup copied to clipboard
ERROR: WAL segment could not be streamed in 300 seconds
A full backup with streaming
pg_probackup-13 backup -B /var/lib/postgresql/pgbackup --instance db01 -b FULL --stream --remote-host=db01.example.net --remote-user=postgres -U backup -d backup -j 4 --temp-slot --compress
is failing due to
ERROR: WAL segment 0000000100021E510000002A could not be streamed in 300 seconds
The backup process was running 14 hours:
db01 13 QYUN4Y ---- FULL STREAM 1/0 14h:36m 3246GB 0 2.45 21AB4/D07BA908 0/0 ERROR
According to logs the basebackup was completed, then it was transferring WALs:
2021-09-03 22:51:16 UTC [4511]: VERBOSE: Writing headers for file "base/16403/4117928982.1" offset: 14202299925, len: 690551, crc: 3981487476
2021-09-03 22:51:16 UTC [4511]: VERBOSE: File "/var/lib/postgresql/13/main/base/16403/4117928982.1". Copied 8781824 bytes
2021-09-03 22:51:16 UTC [4511]: LOG: SSH process 4529 is terminated with status 0
2021-09-03 22:51:17 UTC [4511]: VERBOSE: Writing headers for file "base/16403/4117928982.10" offset: 14202990476, len: 690550, crc: 3216108614
2021-09-03 22:51:17 UTC [4511]: VERBOSE: File "/var/lib/postgresql/13/main/base/16403/4117928982.10". Copied 8781824 bytes
2021-09-03 22:51:17 UTC [4511]: LOG: SSH process 4531 is terminated with status 0
2021-09-03 22:51:17 UTC [4511]: INFO: Data files are transferred, time elapsed: 14h:31m
2021-09-03 22:51:17 UTC [4511]: VERBOSE: (query) SET client_min_messages = warning;
2021-09-03 22:51:17 UTC [4511]: VERBOSE: (query) SET datestyle = 'ISO, DMY';
2021-09-03 22:51:17 UTC [4511]: VERBOSE: (query) SELECT pg_catalog.pg_create_restore_point($1)
2021-09-03 22:51:17 UTC [4511]: VERBOSE: (param:0) = pg_probackup, backup_id QYUN4Y
2021-09-03 22:51:17 UTC [4511]: VERBOSE: (query) SELECT pg_catalog.txid_snapshot_xmax(pg_catalog.txid_current_snapshot()), current_timestamp(0)::timestamptz, lsn, labelfile, spcmapfile FROM pg_catalog.pg_stop_backup(false, false)
2021-09-03 22:51:17 UTC [4511]: VERBOSE: finished segment at 21C4B/B8000000 (timeline 1)
2021-09-03 22:51:17 UTC [4511]: VERBOSE: finished segment at 21C4B/BC000000 (timeline 1)
2021-09-03 22:51:18 UTC [4511]: INFO: wait for pg_stop_backup()
2021-09-03 22:51:18 UTC [4511]: INFO: pg_stop backup() successfully executed
2021-09-03 22:51:18 UTC [4511]: LOG: stop_lsn: 21E51/A8489070
2021-09-03 22:51:18 UTC [4511]: LOG: Looking for LSN 21E51/A8489070 in segment: 0000000100021E510000002A
2021-09-03 22:51:18 UTC [4511]: VERBOSE: finished segment at 21C4B/C0000000 (timeline 1)
2021-09-03 22:51:19 UTC [4511]: VERBOSE: finished segment at 21C4B/C4000000 (timeline 1)
2021-09-03 22:51:19 UTC [4511]: INFO: Wait for LSN 21E51/A8489070 in streamed WAL segment /var/lib/postgresql/pgbackup/backups/psql06c.de/QYUN4Y/database/pg_wal/0000000100021E510000002A
2021-09-03 22:51:19 UTC [4511]: VERBOSE: finished segment at 21C4B/C8000000 (timeline 1)
2021-09-03 22:51:20 UTC [4511]: VERBOSE: finished segment at 21C4B/CC000000 (timeline 1)
2021-09-03 22:51:20 UTC [4511]: VERBOSE: finished segment at 21C4B/D0000000 (timeline 1)
2021-09-03 22:51:21 UTC [4511]: VERBOSE: finished segment at 21C4B/D4000000 (timeline 1)
...
2021-09-03 22:56:16 UTC [4511]: VERBOSE: finished segment at 21C52/BC000000 (timeline 1)
2021-09-03 22:56:17 UTC [4511]: VERBOSE: finished segment at 21C52/C0000000 (timeline 1)
2021-09-03 22:56:17 UTC [4511]: VERBOSE: finished segment at 21C52/C4000000 (timeline 1)
2021-09-03 22:56:18 UTC [4511]: VERBOSE: finished segment at 21C52/C8000000 (timeline 1)
2021-09-03 22:56:19 UTC [4511]: ERROR: WAL segment 0000000100021E510000002A could not be streamed in 300 seconds
2021-09-03 22:56:19 UTC [4511]: WARNING: Backup QYUN4Y is running, setting its status to ERROR
The --archive-timeout
option also apply to a STREAM WAL replication?
--archive-timeout=timeout wait timeout for WAL segment archiving (default: 5min)
Or do I have to increase some timeout on PostgreSQL primary server?
Probably related to #275, but I don't fully understand all comments.
Hello!
Looks like pg_probackup fails to stream all required WAL segments within default 5 minutes timeout.
Yes, you can use --archive-timeout
options to increase the timeout. I would recommend to increase it up to 1h
.
Arrgh, right, now it make sense :disappointed:
Maybe the description could be improved a bit. The timeout is specified in minutes or seconds?
I thought it was a timeout to transfer single WAL, but it seems like it's timeout for transferring all remaining WAL files that weren't streamed during the pg_basebackup
phase. In my case I have only 1 Gbps network, thus the transfer speed is limited to 100 MB/s... so the WAL streaming might be lagging a bit.
It is possible to disable the timeout completely? Maybe add an alias --wal-timeout
or move it from the Archive options
section ... it's a bit confusing when it is applied.