barman icon indicating copy to clipboard operation
barman copied to clipboard

barman backup fails if parallel_jobs is greater than unauthenticated ssh connections

Open eulerto opened this issue 3 years ago • 2 comments

barman backup fails with the following output:

2021-04-15 00:00:01,123 [123456] barman.backup ERROR: Backup failed copying files.
DETAILS: data transfer failure
rsync error:
ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.2]

After investigation, it seems if you use parallel_jobs greater than the maximum number of concurrent unauthenticated connections, chances are that backup might fail. Instead of suggesting that the OP change the MaxStartups in sshd_config, Barman could probably provide a new parameter to throttle the ssh connections. Such sleep mechanism could probably be inject into _execute_job, class RsyncCopyController.

eulerto avatar Apr 15 '21 20:04 eulerto

Thanks for the report. I didn't know about MaxStartups in sshd_config. I agree that starting a large number of ssh connections simultaneously is a bit unfortunate—but I'm not sure about a new parameter, it feels like asking too much of the user to declare the correct throttling regime to use. I'll think a little more about how to improve the default behaviour. Since the default value for MaxStartups is 10, perhaps Barman should wait briefly after opening every 8 (or even 4) connections.

amenonsen avatar Jul 29 '21 05:07 amenonsen

The question is: Is it necessary to have a dozen of ssh connection for parallel copy? I have a gut feeling that you saturate network and/or disk with a few jobs. Barman could possibly check the ssh parameter using /usr/sbin/sshd -T and then adjust the number of parallel jobs accordingly.

eulerto avatar Aug 02 '21 16:08 eulerto