borg add stat-based include/exclude

there are tickets about doing file size based exclusion: #902, jborg/attic#330

file size is a stat result attribute, so this is a special case of a stat-based rule.

also in stat result:

timestamps: atime, ctime, mtime
type and mode
uid / gid

so we could add a mechanism to define inclusion / exclusion rules not only based on the file's path/name (as we already have), but also based on comparing stat attributes to given values.

Oct 07 '18 16:10 ThomasWaldmann

If somebody is finding this searching for a solution/workaround for borg 1.0/1.1:

you can exclude "known-big" files by a name-based pattern, like *.iso (or their directory, like .../Virtualbox VMs/*).
you can use find unix tool to create a list for --exclude-from borg option

Temporarily excluding big files is especially useful for a initial backup(s), which might take a while.

Oct 07 '18 17:10 ThomasWaldmann

Note: the first implementation could just limit the scope to size-based include/exclude (but when writing the code, do it in a way that e.g. timestamp-based can be easily done also).

Oct 07 '18 17:10 ThomasWaldmann

you can use find unix tool to create a list for --exclude-from borg option

Beware of race conditions, though (i.e. large files appearing after you've generated the list).

Oct 07 '18 18:10 n-st

I have a proposal for how this could be implemented. Rather than a global CLI flag like exclude-by-size, it could be added as a special borg-patterns prefix that is applied at the individual pattern level. This would make it quite flexible -- you could apply the rule to certain files/directories only, use it with include patterns, etc.

There's already logic in place to handle prefixes (for R and P) so adding another one should be simple and backwards compatible. I propose calling it F for "filter". The prefix would be followed by a filter-type specifier, any arguments needed for the filter, and finally the pattern to apply the filter to.

So, to exclude files over 100M from Downloads folders, you would write:

F size > 100M -/Users/*/Downloads

To exclude files over 1G everywhere, you could add this to the command line:

borg ... --pattern='F size > 1G -**'

For other stat filters, just replace size with mtime, mode, etc.

I have more thoughts, including how to combine multiple filters together, but wanted to put this out there first. What do you think of the proposal? (If it's well received I may take a stab at implementing it.)

Jun 25 '19 00:06 russelldavis

In the end, guess this will need boolean expressions.

Operators and, or and not.

And the terms in these expressions would be stuff like:

size < 100M
mtime < 1d
user == joe

See man find about what people want to potentially find (not sure all make sense for backups) and how find does it.

As a borg backup archive is usually expected to be a full archive containing all the files in the input data set, guess the first step is to look at what makes sense.

One obvious thing is being in a hurry and wanting to make a quick first backup, ignoring huge files (like having important little documents and less important *.iso).

Other use cases?

Jun 25 '19 16:06 ThomasWaldmann

I think for an initial version of this, we could keep it really simple and not worry about boolean operators. I imagine use cases for them would be relatively rare. By the nature of how patterns combine, or can be already achieved by just writing two separate rules.

And multiple negated include rules (+) can be used to achieve a rough version of and. For example, to exclude user == joe && size > 1M, you could write:

F user != joe + **
F size <= 1M + **
- **

(It's not quite the same as an actual and operator when other rules are involved, since borg stops processing rules once a single match is made, but it's probably Good Enough for now.)

Jun 25 '19 19:06 russelldavis

Other use cases?

The main ones that come to mind are:

Excluding really large files in general. Protect against accidentally adding a multi-GB VM image, for example, when you know the files you actually care about backing up will be much smaller.
My downloads directory tends to accumulate a few random things that would be nice to back up, but I want to exclude large files.

Jun 25 '19 19:06 russelldavis

Another use case that occurred to me: filtering output from borg list. You may want to check a particular archive (or iterate over all archives) and find files matching certain criteria. Examples:

Looking for files modified modified on a particular day
Searching all past archives for files over a certain size, to see what's taking up space in the repo

Although I guess this use case can already be accomplished by using borg mount and find.

Jun 26 '19 20:06 russelldavis

Yeah. Also this is a bit different to implement (one has to look at archived metadata vs. at stat() metadata from fs).

Jun 28 '19 15:06 ThomasWaldmann

it is now (master branch, later borg 1.2) possible to feed find output (paths) into borg instead of using borg's builtin recursion.

so you can do all matching/selecting that is possible via find.

Jan 03 '21 18:01 ThomasWaldmann

it is now (master branch, later borg 1.2) possible to feed find output (paths) into borg instead of using borg's builtin recursion. so you can do all matching/selecting that is possible via find.

Could you please give more details on this or some link to this function? I was searching changelog for "find" keyword without success.

Jan 15 '22 18:01 setaur

He's referring to the unix find command.

Jan 15 '22 18:01 russelldavis

He's referring to the unix find command.

Of course, I know that. But how can I use it to filter files and directories to backup? Now the only solution I can think of, is to put output of my specific find command into file, each line with added specific pattern selector (borg help patterns), preferably pf: and load that file as pattern file using --pattern-from or --exclude-from arguments. Will there be a more elegant solution?

Jan 15 '22 19:01 setaur

borg create
--paths-from-stdin
or
--paths-from-command

See there: https://borgbackup.readthedocs.io/en/1.2.0b3/usage/create.html

Jan 16 '22 14:01 ThomasWaldmann

related: #4972

Apr 15 '23 19:04 ThomasWaldmann

#8895 changed borg a bit: it now reads the simple stat attrs as well as xattrs and ACLs early, before processing file content.

It now has some hardcoded stuff for Linux and macOS standard "no backup" xattrs and also the NODUMP bsdflag is now handled there.

Instead of hardcoding it, there could be either a CLI interface or some other sort of include/exclude "rule".

Jun 03 '25 18:06 ThomasWaldmann