borg icon indicating copy to clipboard operation
borg copied to clipboard

Patterns that match only directories

Open Wonderer0 opened this issue 8 years ago • 10 comments

In rsync it's possible to recursively include all sub-directories below a directory (e.g. /var/log/), but exclude all files, using a pair of filters of the form:

+ /var/log/**/
- /var/log/**

Unfortunately nothing was included when I tried this using Borg (borg-linux64_1.1.0b4) as follows: borg create --patterns-from $BB/patterns.txt $BB/test::excl-logs_1 /var/log where $BB is my home for Borg Backup repos and the patterns file.

At first I thought that this was due issue #2314 and it doesn't work because Borg doesn't yet have a rule prefix to "exclude, but recurse into, searching for includes". However it works in rsync OK even though it doesn't support that type of rule either (as far as I know), and if I understand rsync correctly "-" means "exclude, do not recurse into". The real problem seems to be the way that Borg handles patterns ending with "/".

In rsync "if the pattern ends with a / then it will only match a directory, not a file, link, or device". The first pattern above matches all the subdirectories because they end with "/". Everything else falls through and is excluded by the second pattern. This is consistent with the way expansion works in bash (when globstar is set): /var/log/**/ expands to /var/log/ followed by all its subdirectories (in depth first order) but nothing else.

However in Borg "if a given pattern ends in a path separator, a '*' is appended before matching is attempted.". This means that the first pattern above becomes /var/log/**/*, which matches the contents of all the subdirectories but none of the subdirectories themselves, so they are excluded by the second pattern. I assume the contents aren't included because the directories that contained them weren't. This pattern matching can be confirmed by in bash where ls -ld /var/log/**/* doesn't list any files or subdirectories immediately below /var/log/, only the contents of the subdirectories.

If Borg didn't append the extra "*" to patterns ending with "/", and it's pattern matching worked in the same way as rsync, then that would have at least two advantages:

  • It be possible to include subdirectories but not files as described above.
  • The new "exclude, but recurse into, searching for includes" rule probably wouldn't be needed because rule-sets could be re-written to work the rsync way.

Wonderer0 avatar Apr 07 '17 00:04 Wonderer0

Can someone please let me know if there's any way to get Borg to match directories and not files, as in the example given at the top. Regular expression rules such as re:^/var/log/.*/$ don't match either.

Wonderer0 avatar Apr 10 '17 19:04 Wonderer0

I'm having the same problem. It appears that both files and directories match fm: and sh: patterns that end with a slash because os.path.sep is always appended to the path before matching:

https://github.com/borgbackup/borg/blob/ac4666d7f45003f9d6c152c42b72c0d297a38876/src/borg/patterns.py#L269

So /var/log/syslog will match /var/log/**/ and be included.

For pp:, pf:, and re: patterns, os.path.sep is never added, so + re:^var/log/.*/$ will not match anything.

This seems to contradict the documentation:

https://github.com/borgbackup/borg/blob/ac4666d7f45003f9d6c152c42b72c0d297a38876/docs/usage/help.rst.inc#L23-L26

Is this a bug in the docs, or a bug in the code, or am I misunderstanding one or both?

kevinoid avatar May 13 '22 16:05 kevinoid

@fantasya-pbem did you have a look at this already?

ThomasWaldmann avatar Jun 05 '22 15:06 ThomasWaldmann

https://github.com/borgbackup/borg/blob/ac4666d7f45003f9d6c152c42b72c0d297a38876/docs/usage/help.rst.inc#L23-L26

Is this a bug in the docs, or a bug in the code, or am I misunderstanding one or both?

This is not a bug. It states that an "exclusion pattern" can have a trailing slash. Default style for exclusion is "fm:". This style does not support the "/**/" syntax from shell style patterns. Conclusion: Rsync-compatible patterns are not possible.

You can emulate the desired filtering using --paths-from-stdin and piping the output of find <dir> -type d to Borg. See the last example in the docs of borg create for another example how this works.

fantasya-pbem avatar Jun 06 '22 06:06 fantasya-pbem

This is not a bug. It states that an "exclusion pattern" can have a trailing slash. Default style for exclusion is "fm:". This style does not support the "/**/" syntax from shell style patterns. Conclusion: Rsync-compatible patterns are not possible.

In my last post I incorrectly wrote "pm:" - fixed to "fm:".

A bit more explaination:

In "fm:" exlusion, if it ends in a slash, a '*' is appended. This results in the directory being added and everything therein is excluded. Contrary, "sh:" patterns support the '/**/' syntax, but seem not to work like "fm:" regarding the trailing slash.

Maybe someone can approve this by reviewing the python code and propose an enhancement to shell-style patterns to add the trailing slash behaviour of "fm:".

fantasya-pbem avatar Jun 06 '22 06:06 fantasya-pbem

IIRC, when borg is doing the recursion and feeding names into the matcher, a directory name looks like path/to/directory and files on the same level look like path/to/file (note that neither of them ends in a slash).

ThomasWaldmann avatar Jun 06 '22 20:06 ThomasWaldmann

You can emulate the desired filtering using --paths-from-stdin and piping the output of find <dir> -type d to Borg.

Can --paths-from-stdin be combined with other patterns, for example when creating a backup of /?
e.g. use - /var/log/**/* in --patterns-from to exclude everything under /var/log/, but then include all its subdirectories by piping find /var/log -type d into Borg as suggested. Would the inclusion of the individual paths take priority over the exclusion pattern as the paths are more specific?

Wonderer0 avatar Jun 09 '22 00:06 Wonderer0

From borg create docs regarding --paths-from-stdin: "Will not recurse into directories." So, in your example, exclude the pattern is not needed, as Borg is already only processing the directories from find <dir> -type d. This also means that the find command has to implement every include/exclude itself.

fantasya-pbem avatar Jun 09 '22 08:06 fantasya-pbem

From borg create docs regarding --paths-from-stdin: "Will not recurse into directories."

Yes, I read that, and also how --pattern, --exclude, --patterns-from and --exclude-from are all combined, but couldn't see how they are combined with --paths-from-stdin. I hoped that if --patterns-from included the rest of the required directories and files, apart from /var/log/**, then piping the /var/log/**/directories into --paths-from-stdin would cause them to also be included.

Wonderer0 avatar Jun 09 '22 18:06 Wonderer0

I think these options can be combined, but I am not sure what the effect will be. Someone with knowledge of Python has to check the code I guess...

fantasya-pbem avatar Jun 10 '22 09:06 fantasya-pbem