btrbk
btrbk copied to clipboard
feature: send-receive to multiple receive targets at once
Instead of having in my btrbk.conf, e.g.
target send-receive /mnt/dest1
target send-receive /mnt/dest2
it would be good if by specifying
target send-receive /mnt/dest1 /mnt/dest2
btrbk would use btrfs send to send to dest1 and dest2 simultaneously, i.e., do something like this:
btrfs send /tmp/incremental_backup_snapshot | tee >(btrfs receive /mnt/dest1) | btrfs receive /mnt/dest2
(cf. "How do I send btrfs snapshots to multiple destination drives?") This would save time when making mirrored backups on multiple drives.
Mmh, this is a nice idea, but it would probably cause some non-trivial changes in btrbk: When doing incremental backups, btrbk looks for the "latest common parent" on both sides. With this proposal, there will be three (or ultimately, N) sides to compare, introducing problems like: What if a optimal parent is present on the first target, but not on the second? Should we use the second-best parent then? Or revert to sending twice?.
" What if a optimal parent is present on the first target, but not on the second? Should we use the second-best parent then? Or revert to sending twice?."
I like the idea of having either as options in the config file.
Another potential issue is what to do if one target fails
A third option might be to use a temporary middle storage for the btrfs-send and use that temp file to send to multiple targets,perhaps in parallell. This would save performance on the live source subvol.
Actually I would not surface this feature in the config file format.
Btrbk could optimize transfers on its own, detecting "same volume, different targets (on different drives!) and same parents triples". This would cover 95% of cases – for all other cases separate transfers would be needed.
Hey, I think it could be done by adding an option on the btrbk call to filter the target, and use several services in parallel, providing the same feature, without adding the multi-target complexity in the script.
Would it maybe already be enough to send the snapshots to all, individually, specified targets in parallel instead of one by one? This would significantly improve throughout.
Yes, parallelizing the entire process (and separating into one lock per target) is an interesting idea. However, network congestion because of multiple backups over the same slow line, and disk trashing needs to be taken into account.