1.1-maint vs master performance difference
In #3955 the question came up, why borg create with master branch code is slower than with 1.1-maint branch code. The benchmarks I did there were archiving lots of relatively small files.
I did a source code based analysis, checking the "hot" code path that is taken when backing up known, unchanged, regular files (like when doing a 2nd backup right after a first one, without much changes).
Found this difference, in process_file (simplified pseudo code that is new in master):
fd = open(path)
st_fd = fstat(fd)
stat_update_check(st_path, st_fd)
These changes were done when switching borg master branch to work based on a FD (file descriptor) to avoid race conditions and potential security issues. See #4043.
For this, master code does it like this:
st_path = stat(path)- before dispatching to file-type handler- dispatch based on st_path (e.g. to process_file, process_fifo, process_symlink, ...)
fd = open(path)- get an FD, so we can do other operations based on this FD. new in master (1.1-maint did not open unchanged files and did acquire all metadata based on the path)- stat_update_check: determining
st_fd = fstat(fd)to check for a race condition ("did we dispatch to correct file-type handler?") by comparing st_fd to st_path. this call / this check is new in master - reading file contents (based on open fd, but not in the "hot" case, here we know contents have not changed)
- reading bsdflags (based on open fd on linux, based on st_fd on others)
- reading xattrs (based on open fd)
- reading ACLs (based on open fd)
For the case of a changed file, there is also an additional fstat(fd) call after reading the file's content, to check whether the file changed while we read it.
Related: #4497.
Used this script (based on an early script posted here) to check performance with many (20000) small (0 byte) files.
I run the script overnight many times (when the system load is at the minimum), and while the timing varies somewhat from run to run, I can conclude that:
- 1.2.0a6 is a bit faster than 1.1.9 with default setting and cold cache
- 1.2.0a6 is similar than 1.1.9 with default setting and hot cache
- 1.2.0a6 is a bit faster than 1.1.9 with nobsdflags, first backup and cold cache
- 1.2.0a6 is about +100% slower than 1.1.9 with nobsdflags, second backup and cold cache
- 1.2.0a6 is about +50% slower than 1.1.9 with nobsdflags, and hot cache
- 1.1.9 is about +100% slower and 1.2.0a6 about +200% slower than tar
Example run:
Repo: remote SSHFS
Source: NFS on rotating disks
Testing with 20000 empty files
1.1.9 default cold cache, first backup: 1:11.61
1.1.9 default cold cache, second backup: 0:56.24
1.1.9 default hot cache, first backup: 0:45.28
1.1.9 default hot cache, second backup: 0:45.19
1.1.9 nobsdflags cold cache, first backup: 0:59.90
1.1.9 nobsdflags cold cache, second backup: 0:32.38
1.1.9 nobsdflags hot cache, first backup: 0:30.31
1.1.9 nobsdflags hot cache, second backup: 0:30.90
1.2.0a6 default cold cache, first backup: 0:56.54
1.2.0a6 default cold cache, second backup: 0:45.92
1.2.0a6 default hot cache, first backup: 0:42.75
1.2.0a6 default hot cache, second backup: 0:43.93
1.2.0a6 nobsdflags cold cache, first backup: 0:55.23
1.2.0a6 nobsdflags cold cache, second backup: 0:56.11
1.2.0a6 nobsdflags hot cache, first backup: 0:44.63
1.2.0a6 nobsdflags hot cache, second backup: 0:43.70
tar + gzip cold cache, for comparison: 0:12.90
@FabioPedretti Thanks for doing measurements.
Nice to see that in some cases it got faster.
For the case (usage with --nobsdflags), where it got slower:
- for 1.1, nobsdflags on linux avoids an
open()call which is needed to read the bsdflags for unchanged files (we do not read unchanged files' contents, but we read metadata: acls, xattrs, bsdflags) - for 1.2, nobsdflags does not avoid the
open()call, because we always need to open to work based on a file descriptor (to avoid race conditions)
So it seems that a cold cache open() on NFS is rather slow (maybe not totally unexpected).
Empty files might be the totally worst case here, because there is no content data transfer needing time, so the timing is totally dominated by metadata processing and network delays. It's also the worst case for borg's trick for unchanged files which usually saves lots of time by not having to read the file's content - for 0 bytes, there is nothing to save here.
I also suspect that you have quite a bit of measurement errors due to other influences (I had the same problem even in my local fs measurements), see for example these cases:
1.2.0a6 default cold cache, second backup: 0:45.92
1.2.0a6 nobsdflags cold cache, second backup: 0:56.11 <-- !
1.2.0a6 default hot cache, first backup: 0:42.75
1.2.0a6 nobsdflags hot cache, first backup: 0:44.63
Don't think there can be a reason why borg could be slower when not getting/processing bsdflags.
Somehow embarrassing how fast tar/gz is in this case. :)
Maybe comparing a tar and a borg strace would be interesting (does your tar do the open()? does it read xattrs, acls, bsdflags?).
Of course borg can never win against tar in this zero-bytes files benchmark (there is no content to deduplicate, nothing to avoid reading, but borg still has all the overhead trying to do that anyway).
Empty files might be the totally worst case here... metadata processing...
Yes, I used them because I think they better isolate the problem with "huge amount of files, with very few changes between runs", that I and other experienced. I supposed most of the time of my backups are due to this issue.
I also suspect that you have quite a bit of measurement errors due to other influences...
Yes, sorry for that, I run it in on a production virtual system, also used by others. Indeed they are not perfectly comparable.
does your tar do the open()?
Using "tar (GNU tar) 1.29" from Debian stretch. While borg binaries are the ones downloaded from github.
Maybe comparing a tar and a borg strace would be interesting
Here are the straces for a 10 files "0-byte" run.
it only straced the main process (and there is nothing interesting in there).
you need to use strace -f ... so it also follows to child processes.
New straces, now the empty test files have a "testfile-" prefix to make them better searchable.
BTW:
$ grep -c testfile- *
strace-borg-1.1.9 default cold cache, first backup: .txt:50
strace-borg-1.1.9 default cold cache, second backup: .txt:41
strace-borg-1.1.9 default hot cache, first backup: .txt:41
strace-borg-1.1.9 default hot cache, second backup: .txt:41
strace-borg-1.1.9 nobsdflags cold cache, first backup: .txt:40
strace-borg-1.1.9 nobsdflags cold cache, second backup: .txt:31
strace-borg-1.1.9 nobsdflags hot cache, first backup: .txt:31
strace-borg-1.1.9 nobsdflags hot cache, second backup: .txt:31
strace-borg-1.2.0a6 default cold cache, first backup: .txt:20
strace-borg-1.2.0a6 default cold cache, second backup: .txt:20
strace-borg-1.2.0a6 default hot cache, first backup: .txt:20
strace-borg-1.2.0a6 default hot cache, second backup: .txt:20
strace-borg-1.2.0a6 nobsdflags cold cache, first backup: .txt:20
strace-borg-1.2.0a6 nobsdflags cold cache, second backup: .txt:20
strace-borg-1.2.0a6 nobsdflags hot cache, first backup: .txt:20
strace-borg-1.2.0a6 nobsdflags hot cache, second backup: .txt:20
strace-tar.txt:23
that's because 1.1.x works filename-based (giving each call the path/filename) and 1.2 works fd-based (stats/opens once with the name, then uses the fd).
tar: doesn't do much here:
- I only see it
stat-ing the files. - It doesn't open the file (likely due to 0 bytes).
- It does not try to get xattrs, acls or bsdflags.
borg 1.2:
# name-based stat relative to dir fd to determine file type for dispatching to correct handler
newfstatat(6, "testfile-2", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
# name-based open relative to dir fd to get an FD for the file to avoid race conditions
openat(6, "testfile-2", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_NOATIME|O_CLOEXEC) = 7
# everything below here is file-FD-based:
# stat again to detect filetype having changed (race) and whether we are in correct handler
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
# vvvvvvvvvvvvvvvv only for new/changed files:
# try reading contents
read(7, "", 8388608) = 0
# avoid spoiling the cache with data we'll not need again
fadvise64(7, 0, 0, POSIX_FADV_DONTNEED) = 0
# stat again to see if file has changed while we were backing it up
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
# ^^^^^^^^^^^^^^^^ only for new/changed files
# get bsdflags
ioctl(7, FS_IOC_GETFLAGS, 0x7fff59a5e03c) = -1
# get xattrs
flistxattr(7, "", 4096) = 0
# get acls
fgetxattr(7, "system.posix_acl_access", NULL, 0) = -1 EOPNOTSUPP (Operation not supported)
# done
close(7) = 0
(as expected)
Added new straces with small, non-empty, all different files.
So, to sum up:
- borg 1.2 is behaving as expected and I don't see any superfluous calls for reading the input files, AFAIK everything there has a reason why it is there.
- the tar run maybe was not even trying to backup all metadata (maybe one needs to give some more options, seems saving everything is not the default). would be interesting if tar does the "dontneed" fadvise if the file length is > 0. if it does not, it'll spoil the fs cache when doing the backup.
tar for non-empty file:
# name-based stat relative to cwd
newfstatat(AT_FDCWD, "testfile-2", {st_mode=S_IFREG|0644, st_size=2, ...}, AT_SYMLINK_NOFOLLOW) = 0
# name-based open relative to cwd
openat(AT_FDCWD, "testfile-2", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 3
# vvvvvvvvvvvvvvvv FD-based from here
# stat again
fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
# read contents
read(3, "2\n", 2) = 2
# stat again
fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
# done
close(3) = 0
So, somehow similar to what borg does, just no xattrs/acls/bsdflags and no fadvise.
tar has options for acl and xattrs (but it's not clear from the manpage which is the default). Maybe adding options to disable them also in borg (or having them disabled by default and needing an option to force enable them like tar)?
And, as a micro-optimization, also avoiding some calls when the file is empty?
Empty files: guess it is not worth it micro-optimizing for them. There might be some, but except in benchmarks, you shouldn't have lots of them (lots in the sense of a reasonable percentage of all backup effort).
Switching off acls/xattrs: as seen in #3955 that might give 10% speed advantage for borg 1.2 and relatively small files.
Guess this could be done and might be worth it for some cases, if one has really tons of small files (like maildir or bigger source code repos) and is quite sure one does not have or does not need to backup acls / xattrs.
Here is updated results, with addition of borg 1.0.13 and restic 0.9.5 (not sure restic is directly comparable to borg, see updated script, comments welcome). As with previous posts, a warning that results are variable between runs and not very reliable.
Some notes:
- 1.1.9 (with
--nobsdflags) is still a bit slower than 1.0.13. Hopefully e886df4c should fix that (I'll retest when 1.1.10 get released). - on 1.2.0a6
--nobsdflagsmakes no noticeable difference - 1.2.0a6 still a lot slower than 1.1.9 (both with
--nobsdflags), and the gap will likely widen with 1.1.10 having e886df4c (already in 1.2.0a6) - for some reason
nobsdflags cold cache, second backupis much faster thanfirst backupon 1.1.9, while same time on 1.2.0a6 --noaclsand--noxattrs(#3955) will hopefully make 1.2.0a6 faster, but still not comparable with 1.1 when not using these options (and likely slower again if these get also backported to 1.1)- maybe in the future consider an option to disable FD-based approach for who needs max performance (and the same 1.1 behavior) in 1.2, if that is found to be responsible for the slowdown?
Repo: remote on SSHFS
Source: rotating disks on NFS
Testing with 20000 *empty* files
restic 0.9.5 cold cache, first backup: 0:49.28
restic 0.9.5 cold cache, second backup: 0:49.38
restic 0.9.5 hot cache, first backup: 0:19.52
restic 0.9.5 hot cache, second backup: 0:32.92
borg 1.0.13 default cold cache, first backup: 1:05.86
borg 1.0.13 default cold cache, second backup: 0:30.77
borg 1.0.13 default hot cache, first backup: 0:29.89
borg 1.0.13 default hot cache, second backup: 0:30.17
borg 1.1.9 default cold cache, first backup: 1:24.81
borg 1.1.9 default cold cache, second backup: 1:04.32
borg 1.1.9 default hot cache, first backup: 0:50.33
borg 1.1.9 default hot cache, second backup: 0:50.87
borg 1.1.9 nobsdflags cold cache, first backup: 1:06.92
borg 1.1.9 nobsdflags cold cache, second backup: 0:34.77
borg 1.1.9 nobsdflags hot cache, first backup: 0:33.96
borg 1.1.9 nobsdflags hot cache, second backup: 0:34.16
borg 1.2.0a6 default cold cache, first backup: 1:05.78
borg 1.2.0a6 default cold cache, second backup: 1:04.89
borg 1.2.0a6 default hot cache, first backup: 0:49.65
borg 1.2.0a6 default hot cache, second backup: 0:49.51
borg 1.2.0a6 nobsdflags cold cache, first backup: 1:06.35
borg 1.2.0a6 nobsdflags cold cache, second backup: 1:06.58
borg 1.2.0a6 nobsdflags hot cache, first backup: 0:49.24
borg 1.2.0a6 nobsdflags hot cache, second backup: 0:49.98
tar cold cache, for comparison: 0:16.95
Thanks for all the efforts working on this. ❤
Guess I rather don't want to introduce a switch to choose between fd-based and name-based, it would just make the already quite complex code even more complex.
@ThomasWaldmann As this issue is still open - do you intend to add these test results to the docs?
@fantasya-pbem in this case, the "documentation" tag did not mean to add this stuff to the sphinx-based docs, but rather that this github issue serves as documentation in case somebody wonders about 1.1 vs 1.2 performance differences.
Here is an updated test result (as always note the test may not be 100% reliable since it was done on a VM) with 20000 small and all different files. Some notes:
- all tested borg 1.1 versions perform mostly the same when on the same mode (default or nobsdflags), no noticeable improvement from 1.1.9 to 1.1.10 which included e886df4
- on 1.1 nobsdflags is faster than default
- all tested borg 1.2 versions perform mostly the same (and nobsdflags/noflags gives no improvement)
- borg 1.0 is still the faster, 1.1 (with nobsdflags) just a bit slower, 1.2 a lot slower
- restic 0.11.0 is a lot faster that borg and previous restic versions, it seems this specific 0.11.0 commit improved performance a lot on network FS: https://github.com/restic/restic/pull/2970
If useful I can do more tests (also with different conditions).
Test results:
Repo: remote on SSHFS
Source: rotating disks on NFS
Testing with 20000 *small, all different* files
restic 0.9.5 cold cache, first backup: 0:27.27
restic 0.9.5 cold cache, second backup: 0:38.45
restic 0.9.5 hot cache, first backup: 0:15.96
restic 0.9.5 hot cache, second backup: 0:19.84
restic 0.9.6 cold cache, first backup: 0:29.98
restic 0.9.6 cold cache, second backup: 0:37.93
restic 0.9.6 hot cache, first backup: 0:14.47
restic 0.9.6 hot cache, second backup: 0:18.97
restic 0.10.0 cold cache, first backup: 0:28.79
restic 0.10.0 cold cache, second backup: 0:39.26
restic 0.10.0 hot cache, first backup: 0:13.73
restic 0.10.0 hot cache, second backup: 0:18.82
restic 0.11.0 cold cache, first backup: 0:28.86
restic 0.11.0 cold cache, second backup: 0:11.76
restic 0.11.0 hot cache, first backup: 0:26.02
restic 0.11.0 hot cache, second backup: 0:08.08
borg 1.0.13 default cold cache, first backup: 0:56.51
borg 1.0.13 default cold cache, second backup: 0:18.99
borg 1.0.13 default hot cache, first backup: 0:48.79
borg 1.0.13 default hot cache, second backup: 0:18.39
borg 1.1.9 default cold cache, first backup: 1:06.83
borg 1.1.9 default cold cache, second backup: 0:38.51
borg 1.1.9 default hot cache, first backup: 1:04.94
borg 1.1.9 default hot cache, second backup: 0:32.86
borg 1.1.9 nobsdflags cold cache, first backup: 0:59.77
borg 1.1.9 nobsdflags cold cache, second backup: 0:21.24
borg 1.1.9 nobsdflags hot cache, first backup: 0:50.86
borg 1.1.9 nobsdflags hot cache, second backup: 0:21.80
borg 1.1.10 default cold cache, first backup: 1:10.08
borg 1.1.10 default cold cache, second backup: 0:38.85
borg 1.1.10 default hot cache, first backup: 1:03.18
borg 1.1.10 default hot cache, second backup: 0:29.52
borg 1.1.10 nobsdflags cold cache, first backup: 0:58.14
borg 1.1.10 nobsdflags cold cache, second backup: 0:22.35
borg 1.1.10 nobsdflags hot cache, first backup: 0:50.67
borg 1.1.10 nobsdflags hot cache, second backup: 0:20.66
borg 1.1.11 default cold cache, first backup: 1:05.54
borg 1.1.11 default cold cache, second backup: 0:38.44
borg 1.1.11 default hot cache, first backup: 0:59.26
borg 1.1.11 default hot cache, second backup: 0:31.69
borg 1.1.11 nobsdflags cold cache, first backup: 1:00.45
borg 1.1.11 nobsdflags cold cache, second backup: 0:21.61
borg 1.1.11 nobsdflags hot cache, first backup: 0:50.71
borg 1.1.11 nobsdflags hot cache, second backup: 0:21.01
borg 1.1.13 default cold cache, first backup: 1:07.93
borg 1.1.13 default cold cache, second backup: 0:37.82
borg 1.1.13 default hot cache, first backup: 0:59.95
borg 1.1.13 default hot cache, second backup: 0:31.47
borg 1.1.13 nobsdflags cold cache, first backup: 1:03.51
borg 1.1.13 nobsdflags cold cache, second backup: 0:21.08
borg 1.1.13 nobsdflags hot cache, first backup: 0:58.69
borg 1.1.13 nobsdflags hot cache, second backup: 0:21.64
borg 1.1.14 default cold cache, first backup: 1:07.82
borg 1.1.14 default cold cache, second backup: 0:39.19
borg 1.1.14 default hot cache, first backup: 1:00.16
borg 1.1.14 default hot cache, second backup: 0:31.45
borg 1.1.14 nobsdflags cold cache, first backup: 0:57.37
borg 1.1.14 nobsdflags cold cache, second backup: 0:20.60
borg 1.1.14 nobsdflags hot cache, first backup: 0:50.74
borg 1.1.14 nobsdflags hot cache, second backup: 0:21.32
borg 1.1.15 default cold cache, first backup: 1:07.21
borg 1.1.15 default cold cache, second backup: 0:37.66
borg 1.1.15 default hot cache, first backup: 1:00.46
borg 1.1.15 default hot cache, second backup: 0:29.24
borg 1.1.15 nobsdflags cold cache, first backup: 0:59.19
borg 1.1.15 nobsdflags cold cache, second backup: 0:20.70
borg 1.1.15 nobsdflags hot cache, first backup: 0:49.08
borg 1.1.15 nobsdflags hot cache, second backup: 0:20.23
borg 1.2.0a6 default cold cache, first backup: 0:56.24
borg 1.2.0a6 default cold cache, second backup: 0:39.06
borg 1.2.0a6 default hot cache, first backup: 0:49.79
borg 1.2.0a6 default hot cache, second backup: 0:28.95
borg 1.2.0a6 nobsdflags cold cache, first backup: 0:58.44
borg 1.2.0a6 nobsdflags cold cache, second backup: 0:38.26
borg 1.2.0a6 nobsdflags hot cache, first backup: 0:51.34
borg 1.2.0a6 nobsdflags hot cache, second backup: 0:29.42
borg 1.2.0a7 default cold cache, first backup: 0:57.39
borg 1.2.0a7 default cold cache, second backup: 0:37.59
borg 1.2.0a7 default hot cache, first backup: 0:48.49
borg 1.2.0a7 default hot cache, second backup: 0:28.90
borg 1.2.0a7 nobsdflags cold cache, first backup: 0:58.29
borg 1.2.0a7 nobsdflags cold cache, second backup: 0:38.58
borg 1.2.0a7 nobsdflags hot cache, first backup: 0:46.29
borg 1.2.0a7 nobsdflags hot cache, second backup: 0:29.32
borg 1.2.0a8 default cold cache, first backup: 1:00.76
borg 1.2.0a8 default cold cache, second backup: 0:38.96
borg 1.2.0a8 default hot cache, first backup: 0:49.74
borg 1.2.0a8 default hot cache, second backup: 0:29.44
borg 1.2.0a8 noflags cold cache, first backup: 1:01.61
borg 1.2.0a8 noflags cold cache, second backup: 0:37.91
borg 1.2.0a8 noflags hot cache, first backup: 0:51.08
borg 1.2.0a8 noflags hot cache, second backup: 0:28.88
borg 1.2.0a9 default cold cache, first backup: 0:59.19
borg 1.2.0a9 default cold cache, second backup: 0:38.39
borg 1.2.0a9 default hot cache, first backup: 0:51.49
borg 1.2.0a9 default hot cache, second backup: 0:29.55
borg 1.2.0a9 noflags cold cache, first backup: 0:58.78
borg 1.2.0a9 noflags cold cache, second backup: 0:39.10
borg 1.2.0a9 noflags hot cache, first backup: 0:50.09
borg 1.2.0a9 noflags hot cache, second backup: 0:30.42
borg 1.2.0b1 default cold cache, first backup: 0:58.26
borg 1.2.0b1 default cold cache, second backup: 0:38.61
borg 1.2.0b1 default hot cache, first backup: 0:51.02
borg 1.2.0b1 default hot cache, second backup: 0:30.33
borg 1.2.0b1 noflags cold cache, first backup: 0:57.21
borg 1.2.0b1 noflags cold cache, second backup: 0:38.06
borg 1.2.0b1 noflags hot cache, first backup: 0:53.81
borg 1.2.0b1 noflags hot cache, second backup: 0:30.19
tar cold cache, for comparison: 0:35.49
This is the relevant code in 1.2 / master branch (I removed some parts to make it simpler):
def process_file(self, *, path, parent_fd, name, st, ...):
with self.create_helper(path, st, None) as (item, status, hardlinked, hardlink_master): # no status yet
# we already have st here, given by caller when dispatching by fs item type:
# in next line, borg opens the file (which obviously has a speed penalty on network fs)
with OsOpen(path=path, parent_fd=parent_fd, name=name, flags=flags, noatime=True) as fd:
# here borg determines st again, based on the fd! it also checks if st is like expected.
# if mode or inode changed, the update check will throw an exception due to a fs race condition.
st = stat_update_check(st, os.fstat(fd))
# as borg needs to recreate the metadata stream, item gets updated with infos from st here:
item.update(self.metadata_collector.stat_simple_attrs(st))
...
# now we check if the file is known and unchanged, note that we use the correct-for-sure st we got via fd:
known, chunks = cache.file_known_and_unchanged(path, st)
...
# Only chunkify the file if needed
if chunks is not None:
item.chunks = chunks
else:
# we get here, if we have a new or modified file, we need to chunk it!
with backup_io('read'):
self.process_file_chunks(item, cache, self.stats, self.show_progress, backup_io_iter(self.chunker.chunkify(None, fd)))
# at the end, we determine st2 to see if the file changed while we read it:
with backup_io('fstat2'):
st2 = os.fstat(fd)
changed_while_backup = st.st_ctime_ns != st2.st_ctime_ns
cache.memorize_file(path_hash, st, [c.id for c in item.chunks])
# finally, we update the item for the metadata stream with xattrs etc. - this is also based on the fd!
item.update(self.metadata_collector.stat_ext_attrs(st, path, fd=fd))
Considering that we chose the safe way (less race conditions in the fs) using the fd for borg 1.2+, I do not see a safe way to not open the file.
Even it if is 0 bytes or unchanged/known, we need the fd to get the for-sure-correct st and also other metadata like xattrs etc. - we do not have that information in the files cache (and also I guess we rather do not want them there, because it might significantly increase the memory usage of the files cache, which already can be rather high).
Could the open be avoided in that case just for the 1.1 branch? Thanks.
1.1 works very different than master/1.2:
- 1.1: mostly path based
- 1.2: mostly fd based, using path only if there is no fd-based api
but, 1.1 and 1.2 also share some common requirements, like having to get the metadata for the item from the filesystem.
but afaics, 1.1 is doing this path-based, not opening the file unless it really has to read/chunk it (except for getting bsdflags, so you need --nobsdflags for 1.1 to avoid the open).
OK, thanks, I would like borg to be more competitive with many unchanged files (which often is the most common scenario) and it looks file syscalls are much more heavy on networks file systems. Do you think #3955 may still be of help here? See also #3039 hack which improved performance a lot. I can do more test if needed.
well, long term (1.2+) we'll always have the open() call, so if that is expensive on some filesystems even if not reading content, we can not do much about it. guess correctness and as little race conditions as possible are more important here.
once we have the fd (from open() call), we use it for all the metadata-getting operations even if the file contents did not change.
you could do a benchmark how much not doing the xattrs / acl / bsdflags getting ops would save in case of the already open fd.
you can just search for the stat_ext_attrs function and return an empty dict there to simulate that.
I run again the comparison on a test config, with 20.000 unchanged (on the second run), small, all different, files:
- With the source on a local ext4 fs 1.1.17 and 1.2.0 have similar performance. 1.1 with nobsdflags or 1.2 with noflags also don't change a lot.
- With the source on NFS 1.1.17 and 1.2.0 have similar performance with default config. 1.2 with noflags again doesn't change a lot. But 1.1 nobsdflags is 2x faster.
So 1.2.0 (with or without noflags) is still 2x slower than 1.1.17 nobsdflags with the source on NFS.
(1.0.13 is still even a bit faster than 1.1.17 nobsdflags)
Yeah, that's likely because the fd = os.open(filename) + os.stat(fd) is 2x slower on NFS than os.stat(filename).
But if we did it the old way, we would just reintroduce the races we just got rid of...
Yeah, that's likely because the
fd = os.open(filename) + os.stat(fd)is 2x slower on NFS thanos.stat(filename). But if we did it the old way, we would just reintroduce the races we just got rid of...
Do you think something like https://github.com/restic/restic/pull/2970 would be applicable for borg? Note the table with the performance improvement.