btrbk icon indicating copy to clipboard operation
btrbk copied to clipboard

Support a global flag file to exclude some snapshots from cleanup

Open ceremcem opened this issue 2 years ago • 8 comments

The greatest problem with BTRFS in terms of backing up to an external disk is that if there is no common snapshots on both sides, then you have to start all data transfer from scratch.

The backup rotation feature is a MUST, we need to keep a limited number of snapshots in our computers.

If we have too many snapshots taken on our host system, then every common snapshot with the external disk is eventually lost. How can we prevent this case from happening?

Simple solution is to mark the latest snapshot's timestamp by writing it to a global filter file so that any btrbk instance will skip deleting any snapshot having that timestamp upon backup rotation.

Improvement: If btrbk would check the contents of all files in a filter folder (like /etc/apt/sources.d/), then any script/implementation can register and deregister its "DO_NOT_DELETE snapshots" easily.

ceremcem avatar Oct 05 '21 09:10 ceremcem

Had that in mind for a long time, not implemented yet as it's not that simple (I think).

As a workaround, what you can do is to snapshot (or move) the subvolume you want to keep, e.g.:

btrfs subvolume snapshot -r home.20210101 home.20210101.keep_forever

Every subvolume which does not have proper btrbk filename will be ignored by btrbk.

digint avatar Oct 23 '21 09:10 digint

For other users, I implemented such a tool, mark-snapshots.sh. Only dependency of this script is btrfs-ls, whose only dependency is btrbk.

My usage is here. As a summary, you may implement the feature by the following approach:

source_snapshots="/path/to/snapshots/dir/on/pc"
target_snapshots="/path/to/the/external/disk/snapshots/dir"

MARK_SNAPSHOTS="path/to/mark-snapshots.sh --suffix .MYDISK1_DONT_DELETE"

# Make the saved snapshots available for btrbk before backup operations
$MARK_SNAPSHOTS "$source_snapshots" --unfreeze  

# Perform the actual backup operation, exit if it fails
perform-your-backup.sh || { echo "Something went wrong"; exit 1; }

# Backups are taken succesfully, remove the old "saved snapshots", create new ones. 
latest_timestamp=$($MARK_SNAPSHOTS "$target_snapshots" --get-latest-ts)
$MARK_SNAPSHOTS "$source_snapshots" --clean  # Delete the saved snapshots
$MARK_SNAPSHOTS "$source_snapshots" --timestamp $latest_timestamp --freeze  # Save the snapshots which have $latest_timestamp timestamp forever 

ceremcem avatar Nov 30 '21 13:11 ceremcem

DO NOT USE above approach because this breaks the parent/child relationship.

How it is broken:

  1. You send your snapshots to your external disk.
  2. You back up the latest snapshot on your host by btrfs sub snap -r mysub.XXXXXX{,.MY_BACKUP}
  3. You guarantee that mysub.XXXXXX.MY_BACKUP will not be deleted by btrbk on backup rotation.
  4. Eventually mysub.XXXXXX (on host) will be deleted according to btrbk's retention policy.
  5. You restore mysub.XXXXXX.MY_BACKUP as mysub.XXXXXX by btrfs sub snap -r ...

You shouldn't take risk by using mv in step 2 and 5 because mysub.XXXXXX might be deleted by another btrbk instance (run by cron jobs) before you send any incremental backups.

  1. Run btrbk or (manually run btrfs send -p ... ... | btrfs receive). You will end up with ERROR: cannot find parent subvolume error because Received UUID of the target differs from UUID of the parent (source).

Sadly, you will loose all common snapshots with this approach.

Solution

btrbk must implement this retention policy feature by itself. There is no way around.

ceremcem avatar Dec 15 '21 10:12 ceremcem

Workaround: I added an option to the script to change the Received UUID property of the target snapshot, so btrbk (btrfs send|receive) happily detects the incremental stream. Here is my change:

https://github.com/ceremcem/smith-sync/commit/ca64e3122ced9b28e764f62397fb624bea8587a7#diff-2ced0053c635be678fd365a4f4d990a742fdb8c434768d98a1e624a8b22bcba5R161-R165

ceremcem avatar Mar 27 '22 19:03 ceremcem

@digint Given the shortcomings of the workaround, would it be possible to reconsider this feature request? Thanks for this amazing project and it fits my workflow perfectly, except this one feature I miss (coming from the snapper world).

yuchenshi avatar Jul 08 '22 02:07 yuchenshi

I'm currently working on some refactoring/consolidation of code in the archive-refactoring branch. Most things work already as expected, e.g. you get a --exclude cmdline argument which excludes a filter argument from every operation (commit 6aad57632d8ceb5b3d170c5176ac560341e6dad1 and others).

This is planned to get merged for btrbk-0.33.0, along with action-cp (which should replace archive in the long term).

digint avatar Jul 30 '22 11:07 digint

If you want to use filenames in a folder to exclude specific timestamps, you can use:

btrbk -c btrbk.conf --progress run $(for i in `ls path/to/exclude`; do echo --exclude '*'.$i; done)

This option works great. When will you merge it into the master?

ceremcem avatar Sep 18 '22 22:09 ceremcem

IMPORTANT NOTE: For those who create a custom script for getting the latest snapshot timestamp should use the following regex:

list-your-snapshots-with-full-path-somehow | grep -oE -- '[0-9]{8}T[0-9]{4}_?[0-9]?'

If you omit the _?[0-9]? part, you will miss detecting the latest snapshot that are taken within the same minute with the previous snapshot and you will loose your (probably the only) common snapshot.

ceremcem avatar Dec 09 '22 09:12 ceremcem