Kup icon indicating copy to clipboard operation
Kup copied to clipboard

Incremental snapshots with rsync

Open Pointedstick opened this issue 5 years ago • 6 comments

It would be great if Kup offered incremental snapshots with rsync, merging the advantages of versioned backups and a user-browseable filesystem.

This is something I sorely miss from my macOS days, where Apple provided Time Machine, a similar backup service that did it this way. They used hardlinks for deduplication, so each snapshot came with a very low space cost on the backup disk.

See also http://mikerubel.org/computers/rsync_snapshots/

Pointedstick avatar May 26 '19 23:05 Pointedstick

When I first started Kup I wanted something that could work on NTFS, because that is the pragmatic choice for an external disk, which I think remains true until today. At that time I'm pretty sure it would not work with hardlinks. But I recently learned that NTFS now has these things, which made me start thinking if I should drop the support for versioned backups with bup and replace with rsync hardlinks. I have not done much research yet. Again maybe because what is there now works quite OK. I put many many hours into making it nice to use bup backups in KDE, would also feel a bit sad to throw it away. Having two ways of saving versioned backups seems impossible to do without adding lots of text explaining to the user why this or that is a better choice. And actually I think that the space savings of bup only saving the parts of files that have changed is pretty pointless for most people. Unless you are saving big database files or VM disk images, not exactly Kup target users. So yeah, with some more research... this could be a possible way forward. Volunteers wanted!

spersson avatar May 29 '19 08:05 spersson

Thanks, that makes perfect sense, and I can totally understand why you might not want to throw away all the work done on bup. Speaking personally here, I refuse to use any backup system that stores the files in any kind of database, monolithic file, binary blob, etc. I've had too much bad luck with these getting corrupted or becoming inaccessible if I don't happen to have the software required to open it available at the time when I need to restore. So on general principle, I only use backup systems that create browse-able folder hierarchies.

Pointedstick avatar May 29 '19 16:05 Pointedstick

Sorry to disturb, just wanted to say that you can have user-browseable backups even with other tools, including Bup I think. You just have to mount using fuse. In my pull request #61, I implemented a way to list the snapshots and to open each like a regular filesystem with dolphin. Bup, Restic and Borg do this. Moving from these tools to rsync, you would probably lose encryption, compression, strong deduplication and even redundancy (for those implementing it). Restic and Borg also support pruning basing on rules (I don't know about Bup). Are you really considering removing features instead of adding?

carlonluca avatar Jun 17 '19 18:06 carlonluca

You just have to mount using fuse.

If this would require installing Kup (or any other software) on a different machine that I want to use to browse the backup or restore files, that kind of defeats the purpose of having the backup be browseable using the filesystem, which is to preserve maximum compatibility so that you're never left in a situation where you have your backup, but it's not accessible.

Pointedstick avatar Jun 18 '19 10:06 Pointedstick

That does not "defeat" the purpose in any way at all. Browsing is useful in any case you want to restore something or simply look or work with files inside the backup. Just like you do with time machine. A backup is not a way to transfer files or a way to bring your data with you anywhere you go, by definition of backup. Binary deduplication, compression and encryption are therefore key features. "It's not accessible": you can always access. You just need the software. Just like you need ext4 driver to mount a ext4 volume storing your backup.

In any case I see what you mean, but I personally consider deduplication, compression and encryption much more relevant for a backup than easy access from anywhere. I was hoping Kup would improve by adding at least compression, pruning and support for remote backups instead of going back.

carlonluca avatar Jun 18 '19 18:06 carlonluca

I have done some more research on the topic by testing rsync with hard-links, reading around and discussing a bit. No conclusion yet, similar to how I am unable to get to a conclusion on bup/borg/restic. So far the main issues I can think of with rsync is that it does not detect file moves/renames and that by having plain files on an NTFS partition some file names allowed on unix are not allowed. (And permissions are not preserved but that is minor). I have not seen much use for compression and deduplication features so I don't think they would be missed much (mostly useful for database files and for storing files from many computers into one repo, basically server scenarios). Encryption can be done already today for people who care about that, with an encrypted filesystem. That would not change with rsync.

There was no perfect solution when I started kup but at that time bup seemed best. Now I have seen some of the problems with bup, mostly how difficult it is to contribute changes to that project but also that deduplication doesn't give me much benefit except for helping with file moves/renames. There is still no perfect solution. I can list pros/cons of all three contenders (rsync hardlinks, bup, borg/restic) but the last day I've mostly been thinking if there is a way to have the cake and eat it too. I do like the motto of "powerful when needed", but even then I think it is important to be careful before adding configuration options. For example, Kup used to have a setting for compression level 1-9. I removed it when I realized that 99% of files I was backing up were already compressed and bup's compression did close to nothing. Back to which backend to use... perhaps it's possible to keep the bup backend but make versioned backups use rsync and harlinks be default. Then add a "use bup" checkbox to the advanced settings? At least that would be a smooth upgrade path (can keep existing users, and their backups) and it would be a small effort to do, and not complicate the settings too bad. Still... I don't like having to fill the UI with an explanation of pros/cons between rsync and bup. Perhaps just leave that out and say that "don't select this option unless you know what bup is"... it's on the advanced settings page after all.

Difficult decisions! Thank you all for good discussions! To be continued.

spersson avatar Jun 20 '19 05:06 spersson