backup-bench
backup-bench copied to clipboard
speed
... depends on a lot of things and might be hard to compare.
just a few insights (from borg development):
- borg 1.x does some information gathering based primarily on the filename, assuming that same filename means same file.
- borg 2.0 will play a bit more on the safe side considering race conditions due to changing file systems, so it opens the file to get a file descriptor (fd) and then does the information gathering using the fd. the fd will always refer to the same fs object.
- borg >= 1.2 checks if a file has changed while it was backed up.
- these are a few reasons why more recent borg versions got a bit slower than older ones, especially on NFS, because open() and stat() are slow there.
So, sometimes speed == quick & dirty and slower == better / safer.
The less you do, the faster you get. The question is then if you still do enough / all that is needed.
I can definitly add the file descriptor part to the README section.
I don't get the part where borg >= 1.2 checks if a file has changed while it was backed up. Does it the backed up file to it's last state while doing backups ? What if the file continuously changes ?
So, sometimes speed == quick & dirty and slower == better / safer.
This statement is something I can live by, except for parked files like qcow files with external snapshots, which will never change while being backed up (actual thing I do with borg as of today).
The "changed while backup" only detects that there might be a problem, it does not avoid it (like a snapshot).
In some cases, it might be not an issue (like e.g. a log file growing a line at the end), but in other cases it might warn the user of an issue (e.g. if you backup some sort of database and the file changes internally while you back it up - the file as read by borg could then be inconsistent internally).
Thanks for the clarification. This let's me think of pre-freeze and post-thaw scripts for databases ;)
I'll add a "backup coherence" entry in the table which I can link to this discussion.
Jsut a side question, when using borg cli, will there be a specific exit code in those cases, or must the output be parsed to find out whether a file changed while being backed up ?
Currently there are only a few exit codes and also it is hard to map warnings to exit codes (because there can be multiple different warnings), so one currently needs to read the log output.
I added a new benchmark with qemu disk images (see last README.md file) Noticed that borg performs quite well for that usecase, whearas backing up the linux kernel source files is not that great in terms of speed. Is that explained by your above statements about open() and stat() ?
Could be, because if you have a lot of small files, the per-file overhead has a much bigger effect than for few big files.