bakelite icon indicating copy to clipboard operation
bakelite copied to clipboard

Difficulty backing up localindex

Open richfelker opened this issue 3 years ago • 1 comments
trafficstars

In order to be able to continue using an incremental backup after restoring from it, you need the localindex corresponding to it. This can be achieved by making sure it's included in the backup, but that has 2 problems:

  1. It's a large file (possibly hundreds of MB or even some GB) that's regenerated each time and cannot itself be backed up incrementally, so it adds a lot of storage and bandwidth cost to each backup if it's included, and
  2. The index backed up would be for the previous incremental backup state, not the new one being generated, which is okay if both are kept but could point to blobs that no longer exist if the previous one was pruned already.

The second problem is solvable by keeping backups of indices in a separate backup store (note: they should still be encrypted, so this would mean another bakelite backup store, not just rsync or something), but the first remains.

I think the most elegant solution would be not to backup the index at all (exclude it, either manually in exclude file, or automatically by matching inode) and instead add functionality in the restore operation to regenerate the index. A block-only index can be created simply by decrypting the blocks and mapping the sha3 of their decrypted content to the encrypted blob sha3. The inode part of the index can only be recreated when the files are actually restored into a real filesystem and assigned inode numbers. This may be problematic if the restore is taking place onto a transport medium that's different from the final filesystem the restored data will live on.

Many users may be happy with just the block index being restored, as that covers the bulk of data in a backup with mostly files larger than 4k in size; without the inode index, new inode records would just be created for everything on the next incremental backup, but all the block data would be reusable. However we could also dump an intermediate file for regenerating the index, mapping pathnames to inode records in the backup, which could be programmatically converted to an inode-based index once the files are in their final place.

richfelker avatar Feb 04 '22 17:02 richfelker

I've written and tested a proof of concept for regenerating the localindex as part of the restore operation, and it worked for restoring an continuing incremental backups from a test repository. I think this is an acceptable solution, so I'll try to polish it up and commit it. Current limitations that need to be overcome:

  • No device map, so inodes on non-restore-root device won't be trackable for incremental update (but the blocks they use of course will).
  • Missing error checking
  • Missing logic to skip backing up the index file on the backup side

richfelker avatar Feb 08 '22 19:02 richfelker