borg create: try to speed up unchanged file processing
Early borg used to work filepath-based to deal with the source files (stat, open, read, xattrs, acls, fsflags).
That was problematic due to race conditions and thus was changed in borg 1.2 to use open() to get a file descriptor and then work with the fd everywhere possible, so we can be sure to always deal with the same fs object independently of its path.
But: fs api calls and especially open() can be rather slow for some filesystems, like network filesystems.
So, for an unchanged file (files cache hit), it currently does:
- st1 = stat(path)
- fd = open(path)
- st2 = fstat(fd)
- check st1 against st2
- here it notices (by checking against the files cache contents) that the file is unchanged and decides not to read the files content, but reuse the chunkids from files cache.
- reads xattrs, except when --noxattrs is given
- reads acls, except when --noacls is given
- reads fsflags (bsdflags), except when --noflags is given
- creates a new archive item
Review the code if it can be modified for the unchanged file case, so that the open and fstat call is not needed, without causing issues like re-introducing races.
Related:
- #7374
- #6019
- #4498 (also a lot of details / discussion why borg 1.2+ works fd-based, benchmarks)
With default borg options, it can not be sped up while retaining its consistency properties, because even with a files cache hit (== no need to read the file's content), borg still needs to read:
- xattrs
- acls
- fsflags (bsdflags)
- stat
We can only be sure they all refer to same fs object, if we open the file and work based on the fd to do all these syscalls.
So, guess the only mode when it could be accelerated is with --noxattrs --noacls --noflags .... Then we could just use the name-based stat values and the cached chunkids from the files cache hit and not open the file because there would be no need for further fd-based file operations.
Check patch there: https://github.com/borgbackup/borg/issues/4498#issuecomment-1221432167