bupstash
bupstash copied to clipboard
The shopping list for the perfect backup solution ;)
Hello,
While deciding whether I should keep using borg, or switch to restic, I happened to read about kopia, then found bupstash. While all those projects seem prety interesting, it would be nice to have a full comparaison of those 4 (usually you get borg vs restic comparaisons, no kopia).
IMHO, here's a list of what I would personnaly require from a backup solution. Some of the concepts are hard to compare, but at least, the following list would give a rough idea of what the different backup softwares are capable of.
- Reliability
- Hard to tell if a software is reliable (no benchmarks can be done easily), but of course it's good to know if the backup software keeps redundant copies of the indexes, has a checksumming system, allows restoring when repo has bad/missing blocks...
- Language safety (yes, talking about rust here)
- Restore speed
- This is the major problem for all backups. How fast can you gain access to your backup data (can be benchmarked)
- Can files be restored easily (using regexes ?)
- Can the backup perhaps be mounted using FUSE or something alike ?
- Backup speed
- How fast is my data backed up ? When using a lot of small files ? When using big files, ie disk images? (can be benchmarked)
- Can we use exclusion lists with regexes so backup sizes shrink ?
- Dedup / compression efficience
- What's the deduplication & compression ratio (can be benchmarked)
- This is especially useful when backups are done over a low bandwidth WAN link
- Any neweer compression algorithms used, ie zstd ?
- Can some files be excluded from compression since they already are compressed ? (I maintain some lists here https://github.com/deajan/linuxscripts/tree/master/burp/incexc)
- platform support
- Are there pre-built binaries ? Yay, getting to work quick is good, especially when you need to restore somewhere
- Is Windows supported ? As first class citizen ?
- How about snapshot support ?
- On unixes, snapshots are handled outside of the backup tool of course, but there should be a way to strip the snapshot path from the backups, ie
/mysnapshot/etc/passwdbecomes/etc/passwdin my backup - On windows, there should be VSS support, where the backup tool could make a win32 call to make a snapshot (even WMI call) and then backup the files while stripping the UNC path, ie
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\UsersbecomesC:\Users
- On unixes, snapshots are handled outside of the backup tool of course, but there should be a way to strip the snapshot path from the backups, ie
- WAN support
- What remote backends are supported ?
- What's the speed overhead when backing up remotely ? (can be benchmarked)
- Can the remote protocol pass UTM firewalls ? (ie HTTPS/443)
- Security
- Are encryption algorithms secure ? Ie 256bit, AES-GCM or PolyChacha based ?
- What happens when the client server that does the push backups gets compromised ?
- In case of a push backup, can the client delete backups ? (is there a way to disable this ?)
- In case of a push backup, does the client hold an encryption key locally that can be used to decrypt the backups ? This could be a major issue if the client server gets compromised
- Is there a pull backup possibility instead of push so the backup server holds all security tokens instead of the client ?
- Compressed and encrypted data can be guessed (ie heartbleed type attacks), is there a salt on encrypted data so this won't happen ? If so, does the salt prevent deduplication ?
- Are script hooks before and after backups supported ? (in order to call snapshots operations, mount remote storage, call rclone or whatever)
The list above gives the idea of a "perfect" backup solution (at least for me ;)), and would be a lot of research. But as far as I read about all the solutions, bupstash could be the one. Script hooks could easily do the snapshotting stuff as long as the snapshot prefix path can be removed in the filenames while backing up. Windows support would be a major feature, as only restic does this so far (but restic lacks compression and has other drawbacks).
I don't have time to answer currently, but the answer for most of these is we do it (and we do it well), or we plan to do it. When exactly they will all happen is another question.
I can definitely do a more detailed breakdown later and update this comment.
Thanks for the quick answer. I do propose the following table that could be updated as time goes.
Perhaps you could update some of the desgin related questions for bupstash ;)
Last update: 02 Jan 2022
| Backup software | Version |
|---|---|
| Borg | 1.16 |
| Restic | 0.12 |
| Kopia | 0.8.4 |
| Bupstash | 0.9.1 |
Comparaison table
| Goal | Functionnality | Borg | Restic | Kopia | Bupstash |
|---|---|---|---|---|---|
| Reliability | Redundant index copies | ? | ? | Yes | ? |
| Reliability | Continue restore on bad blocks | ? | ? | ? | ? |
| Reliability | Data checksumming | Yes (CRC & HMAC) | ? | ? | ? |
| Reliability | Language memory safety | No (python) | No (go) | No (go) | Yes (rust) |
| Restoring Data | Restore speed (data set 1) | ? | ? | ? | ? |
| Restoring Data | Restore speed (data set 2) | ? | ? | ? | ? |
| Restoring Data | File includes / excludes bases on regexes | ? | ? | ? | ? |
| Restoring Data | Backup mounting as filesystem | ? | Yes | ? | ? |
| Backup Data | Backup speed (data set 1) | ? | ? | ? | ? |
| Backup Data | Backup speed (data set 2) | ? | ? | ? | ? |
| Backup Data | File includes / excludes bases on regexes | ? | ? | ? | ? |
| Dedup & compression efficience | Is data compressed | Yes | No | Yes | Yes |
| Dedup & compression efficience | Uses newer compression algorithms (ie zstd) | Yes | No | Yes | Yes (3) |
| Dedup & compression efficience | Can files be excluded from compression | ? | No | ? | ? |
| Dedup & compression efficience | Is data deduplicated | Yes | Yes | Yes | Yes |
| Dedup & compression efficience | Dedup ratio (data set 1) | ? | ? | ? | ? |
| Dedup & compression efficience | Dedup ratio (data set 2) | ? | ? | ? | ? |
| Platform support | Unix Prebuilt binaries | No | Yes | Yes | No |
| Platform support | Windows support | Yes (WSL) | Yes | Yes | No |
| Platform support | Windows first class support (PE32 binary) | No | Yes | Yes | No |
| Platform support | Unix snapshot support where snapshot path prefix is removed | ? | ? | ? | ? |
| Platform support | Windows VSS snapshot support where snapshot path prefix is removed | No | Yes | No | No |
| WAN Support | Can backups be sent to a remote destination without keeping a local copy | Yes (SSH) | Yes (HTTPS) | Yes (HTTPS) | Yes (SSH) |
| WAN Support | Time overhead sending data remotely (data set 1) | ? | ? | ? | ? |
| WAN Support | Time overhead sending data remotely (data set 2) | ? | ? | ? | ? |
| WAN Support | What other remote backends are supported ? | rclone | (1) | (2) | ? |
| WAN Support | Can the protocol pass UTM firewall appliances with layer 7 filter | Yes | Yes | ? | ? |
| Security | Are encryption protocols sure (AES-256-GCM / PolyChaCha) ? | Yes | ? | ? | Yes |
| Security | Can encrypted / compressed data be guessed (CRIME/BREACH style attacks)? | ? | ? | ? | ? |
| Security | Can a compromised client delete backups? | No (append mode) | ? | ? | No |
| Security | Can a compromised client restore encrypted data? | Yes | ? | ? | No |
| Security | Are pull backup scenarios possible? | Yes | No | ? | ? |
| Misc | Does the backup software support pre/post execution hooks? | ? | ? | ? | ? |
| Misc | Does the backup software provide an API ? | Yes (JSON cmd) | Yes (REST API) | ? | ? |
| Misc | Can a backup repo be mounted as file system ? | No | Yes | Yes | No |
(1) S3/Wasabi/B2/SFTP/Aliyun/Swift/Azure/Google (2) Google/S3/B2/SFTP/rclone* (3) Next version after bupstash 0.13.0 will support zstd encryption
- data set 1 and data set 2 need to be downloadable file sets. data set 1 should be lots of small files (ex; kernel sources ?). data set 2 should be a couple of bigger files (virtual images, various iso images ?)
Some notes:
I would not consider zstd that much newer than lz4 , 2011 vs 2015 . Actually it seems like a mistake to use this as a metric because the deduplication plays a far bigger role in repository size for many workloads than compression. Bupstash uses lz4 because it is faster and has pure rust implementations, also considering that deduplication is more important than compression for its intended use case, the faster speed of lz4 seemed more important.
Another thing to consider for reliability is lines of code in the write path, which is related to the likelihood of bugs per line of code.
Another note is that python and go are memory safe too, and most of these tools use C libraries that are not memory safe but are well fuzzed, so it is not an accurate comparison currently.
I also wonder about the relevance of CRIME/BREACH because attackers do not control which data is compressed in the same way they might for web servers, though i admit the possibility. Bupstash uses key specific hmacs for deduplication and session keys that are not reused, which may mitigate these issues.
w.r.t. performance, this post might be useful - https://acha.ninja/blog/encrypted_backup_shootout/ , though bupstash is probably faster now than when I posted this.
Here is my take on it, including some benchmarks and taking a slightly different set of requirements: https://masysma.lima-city.de/37/backup_tests_borg_bupstash_kopia.xhtml
I did not test restic, though. Feel free to use my scripts for doing benchmarks on it using your own data sets :)
@deajan: From your table, Borg supports rclone and pull-schemes. I did not know about that (but used an older version).
Oh, nice.
Thanks for sharing!
@m7a i suspect the NFS issues have been resolved in the latest release. w.r.t. the large number of files, this will become less of an issue as some alternative storage plugins become ready. This is a really great write up.
@m7a Indeed, very good writeup. Side question, is there a specific reason you didn't test restic along the other solutions ?
@deajan Thanks :)
I had already tested restic in a practical scenario (backup about 60 GiB of rarely changing data once per day for some weeks) and observed the following issues:
- Deleting old backups was extremly slow (as in taking hours). AFAICT this is by design and a showstopper for my use cases.
- I was using an old version and this version was unable to restore its own backups. I cannot recall a link to bug right know, but IIRC it was a known issue that was resolved in a new version. Despite some warnings being printed upon backup creation and restore, it never clearly indicated that it had in fact not restored (or created) the backup correctly. Some files were missing from the restore and other files were incorrectly restored with size zero.
I found both of these behaviours to be highly irritating and hence opted not to include restic in my comparison. Given that this experience is highly “anecdotical” (I have not tried to reproduce it), I did not include these findings in the backup comparison.
@andrewchambers -- I saw your comments on zstd at https://github.com/andrewchambers/bupstash/issues/174#issuecomment-853704207 in this thread.
In my testing, a backup using lz4 is consistently 33-35% larger than zstd, while the time difference is around 12-13%. (Space difference checked with borg+lz4, borg+zstd, and bupstash; time difference is between the first two only, )
Space is much more important to me than speed, especially when the differences are as above, so I am asking if you ever intend to make zstd at least an option. I'm OK if the answer is "no", but need to know.
Edited to add: just to be clear, in my use case the dedup is similar for both borg and bupstash. So, while I get that dedup will have a bigger impact than compression in the general case, in my case the end result is still a repo that is 35% larger.
I think we could add support for zstd as some sort of repository config option or client option.
zstd and lz4 both have configurable compression levels that affect compression speed.
I have created https://github.com/andrewchambers/bupstash/issues/258 to track this.
thank you!
Good news, the newest release zstd is now nearly as fast as lz4 while compressing much better - I am happy to upgrade.
Shopping list (and benchmark) is now updated at https://github.com/deajan/backup-bench