btrbk
btrbk copied to clipboard
Support a global flag file to exclude some snapshots from cleanup
The greatest problem with BTRFS in terms of backing up to an external disk is that if there is no common snapshots on both sides, then you have to start all data transfer from scratch.
The backup rotation feature is a MUST, we need to keep a limited number of snapshots in our computers.
If we have too many snapshots taken on our host system, then every common snapshot with the external disk is eventually lost. How can we prevent this case from happening?
Simple solution is to mark the latest snapshot's timestamp by writing it to a global filter
file so that any btrbk
instance will skip deleting any snapshot having that timestamp upon backup rotation.
Improvement: If btrbk
would check the contents of all files in a filter folder (like /etc/apt/sources.d/
), then any script/implementation can register and deregister its "DO_NOT_DELETE snapshots" easily.
Had that in mind for a long time, not implemented yet as it's not that simple (I think).
As a workaround, what you can do is to snapshot (or move) the subvolume you want to keep, e.g.:
btrfs subvolume snapshot -r home.20210101 home.20210101.keep_forever
Every subvolume which does not have proper btrbk filename will be ignored by btrbk.
For other users, I implemented such a tool, mark-snapshots.sh. Only dependency of this script is btrfs-ls, whose only dependency is btrbk
.
My usage is here. As a summary, you may implement the feature by the following approach:
source_snapshots="/path/to/snapshots/dir/on/pc"
target_snapshots="/path/to/the/external/disk/snapshots/dir"
MARK_SNAPSHOTS="path/to/mark-snapshots.sh --suffix .MYDISK1_DONT_DELETE"
# Make the saved snapshots available for btrbk before backup operations
$MARK_SNAPSHOTS "$source_snapshots" --unfreeze
# Perform the actual backup operation, exit if it fails
perform-your-backup.sh || { echo "Something went wrong"; exit 1; }
# Backups are taken succesfully, remove the old "saved snapshots", create new ones.
latest_timestamp=$($MARK_SNAPSHOTS "$target_snapshots" --get-latest-ts)
$MARK_SNAPSHOTS "$source_snapshots" --clean # Delete the saved snapshots
$MARK_SNAPSHOTS "$source_snapshots" --timestamp $latest_timestamp --freeze # Save the snapshots which have $latest_timestamp timestamp forever
DO NOT USE above approach because this breaks the parent/child relationship.
How it is broken:
- You send your snapshots to your external disk.
- You back up the latest snapshot on your host by
btrfs sub snap -r mysub.XXXXXX{,.MY_BACKUP}
- You guarantee that
mysub.XXXXXX.MY_BACKUP
will not be deleted bybtrbk
on backup rotation. - Eventually
mysub.XXXXXX
(on host) will be deleted according tobtrbk
's retention policy. - You restore
mysub.XXXXXX.MY_BACKUP
asmysub.XXXXXX
bybtrfs sub snap -r ...
You shouldn't take risk by using
mv
in step 2 and 5 becausemysub.XXXXXX
might be deleted by anotherbtrbk
instance (run by cron jobs) before you send any incremental backups.
- Run
btrbk
or (manually runbtrfs send -p ... ... | btrfs receive
). You will end up withERROR: cannot find parent subvolume
error becauseReceived UUID
of the target differs fromUUID
of the parent (source).
Sadly, you will loose all common snapshots with this approach.
Solution
btrbk
must implement this retention policy feature by itself. There is no way around.
Workaround: I added an option to the script to change the Received UUID
property of the target snapshot, so btrbk
(btrfs send|receive
) happily detects the incremental stream. Here is my change:
https://github.com/ceremcem/smith-sync/commit/ca64e3122ced9b28e764f62397fb624bea8587a7#diff-2ced0053c635be678fd365a4f4d990a742fdb8c434768d98a1e624a8b22bcba5R161-R165
@digint Given the shortcomings of the workaround, would it be possible to reconsider this feature request? Thanks for this amazing project and it fits my workflow perfectly, except this one feature I miss (coming from the snapper
world).
I'm currently working on some refactoring/consolidation of code in the archive-refactoring branch. Most things work already as expected, e.g. you get a --exclude
cmdline argument which excludes a filter argument from every operation (commit 6aad57632d8ceb5b3d170c5176ac560341e6dad1 and others).
This is planned to get merged for btrbk-0.33.0, along with action-cp (which should replace archive
in the long term).
If you want to use filenames in a folder to exclude specific timestamps, you can use:
btrbk -c btrbk.conf --progress run $(for i in `ls path/to/exclude`; do echo --exclude '*'.$i; done)
This option works great. When will you merge it into the master?
IMPORTANT NOTE: For those who create a custom script for getting the latest snapshot timestamp should use the following regex:
list-your-snapshots-with-full-path-somehow | grep -oE -- '[0-9]{8}T[0-9]{4}_?[0-9]?'
If you omit the _?[0-9]?
part, you will miss detecting the latest snapshot that are taken within the same minute with the previous snapshot and you will loose your (probably the only) common snapshot.