backup-bench icon indicating copy to clipboard operation
backup-bench copied to clipboard

speed

Open ThomasWaldmann opened this issue 1 year ago • 6 comments

... depends on a lot of things and might be hard to compare.

just a few insights (from borg development):

  • borg 1.x does some information gathering based primarily on the filename, assuming that same filename means same file.
  • borg 2.0 will play a bit more on the safe side considering race conditions due to changing file systems, so it opens the file to get a file descriptor (fd) and then does the information gathering using the fd. the fd will always refer to the same fs object.
  • borg >= 1.2 checks if a file has changed while it was backed up.
  • these are a few reasons why more recent borg versions got a bit slower than older ones, especially on NFS, because open() and stat() are slow there.

So, sometimes speed == quick & dirty and slower == better / safer.

The less you do, the faster you get. The question is then if you still do enough / all that is needed.

ThomasWaldmann avatar Sep 06 '22 22:09 ThomasWaldmann

I can definitly add the file descriptor part to the README section.

I don't get the part where borg >= 1.2 checks if a file has changed while it was backed up. Does it the backed up file to it's last state while doing backups ? What if the file continuously changes ?

So, sometimes speed == quick & dirty and slower == better / safer.

This statement is something I can live by, except for parked files like qcow files with external snapshots, which will never change while being backed up (actual thing I do with borg as of today).

deajan avatar Sep 07 '22 17:09 deajan

The "changed while backup" only detects that there might be a problem, it does not avoid it (like a snapshot).

In some cases, it might be not an issue (like e.g. a log file growing a line at the end), but in other cases it might warn the user of an issue (e.g. if you backup some sort of database and the file changes internally while you back it up - the file as read by borg could then be inconsistent internally).

ThomasWaldmann avatar Sep 07 '22 20:09 ThomasWaldmann

Thanks for the clarification. This let's me think of pre-freeze and post-thaw scripts for databases ;)

I'll add a "backup coherence" entry in the table which I can link to this discussion.

Jsut a side question, when using borg cli, will there be a specific exit code in those cases, or must the output be parsed to find out whether a file changed while being backed up ?

deajan avatar Sep 08 '22 19:09 deajan

Currently there are only a few exit codes and also it is hard to map warnings to exit codes (because there can be multiple different warnings), so one currently needs to read the log output.

ThomasWaldmann avatar Sep 08 '22 22:09 ThomasWaldmann

I added a new benchmark with qemu disk images (see last README.md file) Noticed that borg performs quite well for that usecase, whearas backing up the linux kernel source files is not that great in terms of speed. Is that explained by your above statements about open() and stat() ?

deajan avatar Oct 02 '22 21:10 deajan

Could be, because if you have a lot of small files, the per-file overhead has a much bigger effect than for few big files.

ThomasWaldmann avatar Oct 02 '22 22:10 ThomasWaldmann