btrbk
btrbk copied to clipboard
Should I Prefer Fileserver-initiated Backups from Several Hosts (Instead of Each Host Sending to the Server)?
I have a small LAN with a fileserver that should store backups from each attached host on the LAN. What is the most efficient (performant) way to do this with btrbk?
Each host (laptops, desktops and a few other devices) does hourly local snapshots with btrbk. Once per day, I would like to send backups of each volume on each device to the local fileserver. This has to be done via SSH (as NFS isn't supported by btrfs send|receive, afaik).
The options I'm aware of from the btrbk readme are:
-
host-initiated backup to the fileserver from each host
-
fileserver-initiated backups from all hosts
My guess is that the second option is preferred. Is that correct?
Assuming I use the second option, do I need to be concerned about it initiating a backup on a host while that host is also performing a local hourly snapshot?
What are the disadvantages of the fileserver-initiated approach?
If one host is offline, will the backup procedure continue on with the other hosts it can reach at that time?
Since deleting snapshots can potentially be a costly operation (in terms of performance), should I split the process into two steps, where one step would pull the backups from each host without any deletions, and a second step would then prune the backups according to configured retention policies?
How many backups (snapshots) can I safely retain for each host volume? I would like to keep as many as possible, but I know there is a threshold at which performance can become a problem.
I mount btrfs volumes on the hosts with these mount options:
autodefrag,noatime,nodiratime,compress=lzo,space_cache=v2
And I have the systemd fstrim.service enabled.
The fileserver is a dedicated backup server, not a general-purpose fileserver. I plan to use most of those same mount options. Do I need the autodefrag option? (Will autodefrag help or hurt performance in this use-case?)
Are there any other recommendations?
Assuming I use the second option, do I need to be concerned about it initiating a backup on a host while that host is also performing a local hourly snapshot?
By 'backup' I assume you mean a btrbk resume operation, in which case no. Taking a new snapshot is basically instant (just a pointer copy in the filesystem), so cannot cause I/O contention with a btrfs send.
What are the disadvantages of the fileserver-initiated approach?
The only one that springs to mind is that it will fail if the target machine isn't online; this is the main reason I use option 1 - host-initiated resumes.
How many backups (snapshots) can I safely retain for each host volume? I would like to keep as many as possible, but I know there is a threshold at which performance can become a problem.
I started with this approach and quickly realised it was unworkable, not for performance reasons in my case, but because any significant churn in the datasets on a disk causes constant out-of-space problems requiring manual intervention. In the end I chose a staged retention policy limited to 3 monthly snapshots, and do quarterly tape backups in case I need to dig out something older. I suggest you either do the latter (or some other equivalent) or save yourself the hassle of constant space problems by setting realistic limits.
I sympathise, though, because if one is worried about data integrity no timespan is long enough to guarantee that you can recover a good version of a corrupted file by the time you finally notice the corruption. For this reason I use a self-developed data-integrity tool to implement continuous monitoring on my filesystems; you can find it on my github page.