backup-bench
backup-bench copied to clipboard
Duplicacy
Hi @deajan, Awesome work!
I took a look at your script and I have a few suggestions:
1-For backup and restore commands, please use the -threads option with 8 threads for your setup. It will significantly increase speed.
Increase -threads from 8 until you saturate the network link or see a decrease in speed.
2-During init please play with chunk size:
-chunk-size, -c
With homogeneous data, you should see smaller backups and better deduplication. see Chunk size details
3-Some clarifications for your shopping list on Duplicacy:
1-Redundant index copies : duplicacy doesn't use indexes. (or db) 2-Continue restore on bad blocks in repository: yes, and Erasure Coding 3-Data checksumming: yes 4-Backup mounting as filesystem: No (fuse implementation PR but not likely short term) 5-File includes / excludes bases on regexes: yes 6-Automatically excludes CACHEDIR.TAG(3) directories: No 7-Are metadatas encrypted too ?: yes 8-Can encrypted / compressed data be guessed (CRIME/BREACH style attacks)?: No 9-Can a compromised client delete backups?: No (with pub key and immutable target->requires target setup) 10-Can a compromised client restore encrypted data? No (with pub key) 11-Does the backup software support pre/post execution hooks?: yes, see Pre Command and Post Command Scripts 12-Does the backup software provide a crypto benchmark ?: there is a Benchmark command.
Important:
13- Duplicacy is serverless: Less cost, less maintenance, less attack surface.. This also means that D will always be a bit slower since it has to list before it uploads a particular chunk. 14: Duplicacy works with a ton of storage backends: Infinitely scalable and more secure. 15-No indexes or databases.
16-You should test partial restore 17-Test data should be a little bit more diverse. But I guess this is difficult Hope this helps a bit. Feel free to join the Forum.
Keep up the good work.
I've updated the comparaison table with your remarks.
13- Duplicacy is serverless: Less cost, less maintenance, less attack surface.. 14: Duplicacy works with a ton of storage backends: Infinitely scalable and more secure.
Does duplicacy have a preferred self hosted backend ?
15-No indexes or databases.
I'm a bit puzzled. Since there are data chunks, there need to be somewhere a description of where they are linked to... something like an index...?
For now, I've added the -threads
option for the next test round.
If I go the chunk size route, I'll have to do this for all backup solutions.
Hi ,
Indeed, the lack of index or db is one of the most amazing design features of Duplicacy Let me quote from the Lock free deduplication algorithm
"What is novel about lock-free deduplication is the absence of a centralized indexing database for tracking all existing chunks and for determining which chunks are not needed any more. Instead, to check if a chunk has already been uploaded before, one can just perform a file lookup via the file storage API using the file name derived from the hash of the chunk. This effectively turns a cloud storage offering only a very limited set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage."