borg
borg copied to clipboard
When a exclude-norecurse pattern ends with a trailing slash, Borg still recurses one level below the excluded path
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes. #3209 might be related.
Is this a BUG / ISSUE report or a QUESTION?
ISSUE
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
1.2.2
Operating system (distribution) and version.
Arch Linux
Hardware / network configuration, and filesystems used.
n.a.
How much data is handled by borg?
n.a.
Full borg commandline that lead to the problem (leave away excludes and passwords)
see below
Describe the problem you're observing.
When a exclude-norecurse pattern ends with a trailing slash, Borg still recurses one level below the excluded path. The exclude still works correct, but Borg should skip recursion.
# normal exclude works as expected
(venv) c0d3@z3r0:[~]# rm -rf repo; borg init -e none repo; venv/bin/borg -p create --pattern '- re:^test/' --list repo:: /test
x /test/Downloadx 0 N test
x /test/Downloadx/owh
x /test/blub
d /test
# without trailing slash: works as expected
(venv) c0d3@z3r0:[~]# rm -rf repo; borg init -e none repo; venv/bin/borg -p create --pattern '! re:^test' --list repo::test /test
x /test
# trailing slash, regex: Borg recurses (exclude still working correct, though)
(venv) c0d3@z3r0:[~]# rm -rf repo; borg init -e none repo; venv/bin/borg -p create --pattern '! re:^test/' --list repo::test /test
x /test/Downloadx 0 N test
x /test/blub
d /test
# same for sh:, just for comparison
(venv) c0d3@z3r0:[~]# rm -rf repo; borg init -e none repo; venv/bin/borg -p create --pattern '! sh:test' --list repo::test /test
x /test
(venv) c0d3@z3r0:[~]# rm -rf repo; borg init -e none repo; venv/bin/borg -p create --pattern '! sh:test/' --list repo::test /test
x /test/Downloadx 0 N test
x /test/blub
d /test
The problem might be that you did not really exclude /test
, because the trailing slash in the exclude-norecurse pattern did not match.
Not sure about the best behaviour here though. The main idea behind exclude-norecurse (!
) was to not recurse into excluded dirs searching for stuff that might be included again (like possible with the normal exclude pattern -
).
Reasons for this can be:
- efficiency (like "i am sure there is nothing to back up in there")
- strange effects happening when recursing into there (automounter mountpoints)
- special filesystems with no user data (like /proc /sys ...)
The problem might be that you did not really exclude
/test
, because the trailing slash in the exclude-norecurse pattern did not match.
Probably yes, but why?
From the patterns man page:
A directory exclusion pattern can end either with or without a slash ('/').
If it ends with a slash, such as some/path/, the directory will be
included but not its content. If it does not end with a slash, such as
some/path, both the directory and content will be excluded.
I don't see that happening here.
Btw. it seems odd to me (but might be right) that Borg first recurses, then checks the directory itself. I would expect it to be the other way round.
Not sure about the best behaviour here though. The main idea behind exclude-norecurse (
!
) was to not recurse into excluded dirs searching for stuff that might be included again (like possible with the normal exclude pattern-
).Reasons for this can be:
- efficiency (like "i am sure there is nothing to back up in there")
- strange effects happening when recursing into there (automounter mountpoints)
- special filesystems with no user data (like /proc /sys ...)
Sure, I got that, but it doesn't seem to work as my example shows.
I think the patterns man page is explaining this correctly. A pattern ending in a slash does not match the directory, so if you have "some/path/" as an exclusion pattern, the directory "some/path" is not excluded. But that pattern matches things in that directory, so they are excluded. I think it is good that borg is consistent about this, with the trailing slash matching the contents of a directory but not the directory itself. So, as far as I can see, all of the examples you posted are consistent with each other.
But that pattern matches things in that directory, so they are excluded.
Oh yeah, when reading it like this it indeed makes sense to me!
However, Borg still goes down the tree at least one level. I haven't checked the code, yet, but maybe it's possible to let Borg feed directories with trailing slashes into the matcher to really prevent recursion completely. Not sure, though, if that would lead to other side-effects
I had a look at the code in archiver.py
(search for scandir) and it looks like a cosmetic problem to me that is not easy to fix.
It does the scandir because the pattern did not match yet on the directory (because of the trailing slash not being matched), then it recurses, but then the pattern matches and thus it lists all contained files as x
.
When the feature is meant to prevent recursion to avoid problems due to bad efficiency, strange effects or weird behaviour with sepcial filesystems, then IMO it's not only cosmetic problem.
Did anything change which maybe made it easier to fix this, or are we stuck in that state?
I tried to fix this in 1.2-maint branch, but it's complicated and does not work without changing how the patterns are prepared (see ._prepare
methods) - which would likely change/break how borg users's include/exclude lists work and thus can't be introduced within 1.2.x.
Maybe for borg 2:
diff --git a/src/borg/archiver.py b/src/borg/archiver.py
index e3c48ab6..a5aade91 100644
--- a/src/borg/archiver.py
+++ b/src/borg/archiver.py
@@ -811,6 +811,11 @@ def _rec_walk(self, *, path, parent_fd, name, fso, cache, matcher,
else:
status = '-'
if recurse:
+ path_with_trailing_slash = path + os.sep
+ if matcher.match(path_with_trailing_slash) and not matcher.recurse_dir:
+ # a pattern matched "directory/" and it is a norecurse pattern, do not scan dir - see #7220.
+ self.print_file_status('x', path_with_trailing_slash)
+ return
with backup_io('scandir'):
entries = helpers.scandir_inorder(path=path, fd=child_fd)
for dirent in entries:
diff --git a/src/borg/testsuite/archiver.py b/src/borg/testsuite/archiver.py
index 8cc930b6..e52f1b06 100644
--- a/src/borg/testsuite/archiver.py
+++ b/src/borg/testsuite/archiver.py
@@ -1300,6 +1300,23 @@ def test_create_pattern_exclude_folder_no_recurse(self):
self.assert_not_in('input/x/a', output)
self.assert_in('A input/y/foo_y', output)
+ def test_create_pattern_exclude_folder_contents_no_recurse(self):
+ """test special case when "directory/" pattern does not exlucde directory, but all inside (norecurse)"""
+ self.patterns_file_path2 = os.path.join(self.tmpdir, 'patterns2')
+ with open(self.patterns_file_path2, 'wb') as fd:
+ fd.write(b'! input/x/\n')
+
+ self.cmd('init', '--encryption=repokey', self.repository_location)
+ self.create_regular_file('x/do_not_touch', size=1024)
+ self.create_regular_file('y/normal', size=1024)
+ output = self.cmd('create', '-v', '--list',
+ '--patterns-from=' + self.patterns_file_path2,
+ self.repository_location + '::test', 'input')
+ self.assert_not_in('input/x/do_not_touch', output)
+ self.assert_in('d input/x', output)
+ self.assert_in('x input/x/', output)
+ self.assert_in('A input/y/normal', output)
+
def test_create_pattern_intermediate_folders_first(self):
"""test that intermediate folders appear first when patterns exclude a parent folder but include a child"""
self.patterns_file_path2 = os.path.join(self.tmpdir, 'patterns2')
See also #7634, which would likely need to get fixed/implemented first.
Workaround: do not have a trailing slash in the pattern definition.
This will work around this issue with the only inconvenience that it won't back up the directory (like e.g. /proc
).
Maybe for borg 2:
diff --git a/src/borg/archiver.py b/src/borg/archiver.py index e3c48ab6..a5aade91 100644 --- a/src/borg/archiver.py +++ b/src/borg/archiver.py @@ -811,6 +811,11 @@ def _rec_walk(self, *, path, parent_fd, name, fso, cache, matcher, else: status = '-' if recurse: + path_with_trailing_slash = path + os.sep + if matcher.match(path_with_trailing_slash) and not matcher.recurse_dir: + # a pattern matched "directory/" and it is a norecurse pattern, do not scan dir - see #7220. + self.print_file_status('x', path_with_trailing_slash) + return with backup_io('scandir'): entries = helpers.scandir_inorder(path=path, fd=child_fd) for dirent in entries:
Hmm, looking at archiver.py (especially def _rec_walk) I haven't yet fully understood the code, so I might be wrong. Couldn't the check already happen ealier, e.g. here where the recursion check is done? https://github.com/borgbackup/borg/blob/1.2-maint/src/borg/archiver.py#L746-L762
@c0d3z3r0 No, because the goal is to have the directory fs item backed up, but not anything inside the directory being read / backed up.
Yeah that was clear, but not the code change :) After staring a while at the code I got it, thank you :+1:
I tested that patch above on 1.2.2. It doesn't seem to work like expected. Debugging shows that matcher.recurse_dir=True and the matcher doesn't match:
# borg create --list --pattern='! re:proc/' /tmp/testbkp::xx$(date +%s) proc/
> /usr/lib/python3.10/site-packages/borg/archiver.py(817)_rec_walk()
816 import ipdb; ipdb.set_trace()
--> 817 path_with_trailing_slash = path + os.sep
818 if matcher.match(path_with_trailing_slash) and not matcher.recurse_dir:
ipdb> n
> /usr/lib/python3.10/site-packages/borg/archiver.py(818)_rec_walk()
817 path_with_trailing_slash = path + os.sep
--> 818 if matcher.match(path_with_trailing_slash) and not matcher.recurse_dir:
819 # a pattern matched "directory/" and it is a norecurse pattern, do not scan dir - see #7220.
ipdb> p matcher.match(path_with_trailing_slash)
False
ipdb> p matcher.recurse_dir
True
ipdb> c
x proc/11753
x proc/1
...
x proc/asound
d proc
Yes, as I already wrote, it is because of the _prepare
method modifies the pattern in a specific way. Also the matching is on a slightly modified path.