barman icon indicating copy to clipboard operation
barman copied to clipboard

barman backup completion error when i use rsync and reuse_link

Open ahmetmelihbasbug opened this issue 2 years ago • 3 comments

we setup barman backup using rsync and reuse_link to backup incremently. 5 days ago full backup was completed and after 2 days, incremental backup with reuse_links completed. However we can not take a backup in last 2 days that we tried several times manually.

barman list-backup mydb
mydb 20230524T014506 - STARTED
mydb 20230521T044506 - Sun May 21 09:54:48 2023 - Size: 5.1 TiB - Wal Size: 402.1 GiB (tablespaces: repo) 
mydb 20230519T014506 - Fri May 19 10:34:48 2023 - Size: 5.1 TiB - Wal Size: 100.2 GiB (tablespaces: repo)

We see that end_offset and end_wal is null however there is no error. barman diagnose output is

barman diagnose .... .... .... }, "20230524T014506": { "backup_id": "20230524T014506", "backup_label": null, "begin_offset": 4012, "begin_time": "2023-05-24T01:45:06.949716+03:00", "begin_wal": "000001BC0000845C000000CB", "begin_xlog": "845C/C8001028", "config_file": "/data/postgresql.conf", "copy_stats":null "deduplicated_size": null, "end_offset": null, "end_time": null, "end_wal": null, "end_xlog": null, "error": null, "hba_file": "/data/pg_hba.conf", "ident_file": "/data/pg_ident.conf", "included_files": [ "/data/postgresql.auto.conf" ], "mode": "rsync-concurrent", "pgdata": "/data", "server_name": "mydb", "size": null, "status": "STARTED", "systemid": "66841250581234512355", "tablespaces": [ [ "repo", 6453213, "/pg_tbl/repo" ] ], "timeline": 222, "version": 120008, "xlog_segment_size": 16777216 } }, "config": { "active": true, "archiver": false, "archiver_batch_size": 0, "backup_directory": "/barman/mydb", "backup_method": "rsync", "backup_options": "concurrent_backup", "bandwidth_limit": null, ... .... .... .. "status": { "archive_command": "cp %p /WAL_archive/%f", "archive_mode": "on", "archive_timeout": 0, "checkpoint_timeout": 900, "config_file": "/data/postgresql.conf", "connection_error": null, "current_archived_wals_per_second": 0.152131251231, "current_lsn": "8461/C14817C0", "current_size": 210798164118.0, "current_xlog": "000001BC00008461000000C1", "data_checksums": "on", "data_directory": "/data", "failed_count": 0, "has_backup_privileges": true, "hba_file": "/data/pg_hba.conf", "hot_standby": "on", "ident_file": "/data/pg_ident.conf", "included_files": [ "/data/postgresql.auto.conf" ], "is_archiving": true, "is_in_recovery": false, "is_superuser": true, .... .... ....

There is no error in barman.log, Sometimes it collects 17 GB (50 GB in another attempt) and there is no rsync processes on servers after a while

ahmetmelihbasbug avatar May 24 '23 14:05 ahmetmelihbasbug

Hi @ahmetmelihbasbug - since end_offset and end_wal are null and the status is STARTED we know the backup process terminated before the completion of the backup (if the main backup process had run to completion then the end state would be FAILED).

Some possible reasons which are worth investigating further are:

  • The Barman process could be being killed by the OOM killer. Check the output of dmesg on your Barman host to see if there is any evidence of OOM killer activity.
  • The backup process might be running with a number of parallel jobs higher than the value of sshd's MaxStartups (usually this defaults to 10) meaning that some Rsync connections are terminated during the backup.
  • The Barman process might be running under nohup - there are some scenarios, e.g. the SSH connection used to create the nohup process timing out - which can cause Barman's worker processes to receive a SIGHUP and terminate.

A couple of related questions:

  • How are you running the barman backup process? Is it run under cron, or via some other means?
  • What arguments are you using in your barman backup command?

mikewallace1979 avatar May 24 '23 15:05 mikewallace1979

@mikewallace1979 Hello - we have .sh script file to run in crontab or we use it command line

#!/bin/bash
/usr/bin/barman cron
/home/barman/deletedfailed.sh
/usr/bin/barman backup mydb -j 10

In the instance, there are pg_dump hourly jobs in crontab, too.

  • MaxStartups 100

ahmetmelihbasbug avatar May 25 '23 07:05 ahmetmelihbasbug

Hi @ahmetmelihbasbug, when you mention you cannot do a backup you mean that you try to execute barman backup mydb and nothing happens, or that no matter what you do your backup hangs?

gcalacoci avatar May 26 '23 09:05 gcalacoci