backintime
backintime copied to clipboard
Can't include specific files from excluded wildcard match (rsync: ordering of --exclude/--include) - Workaround exist
i'm trying to exclude most of my dotfiles and include a few specifically but it's not working.
i added /home/xin/.* to my excludes and to my includes:
- 📁
/home/xin/.ssh - 📁
/home/xin/.fonts - 📄
/home/xin/.bashrc - 📄
/home/xin/.gitconfig - 📄
/home/xin/.gitignore
the folders are getting backed up but not the single files.
i checked the logs and the rsync command generated by BiT is putting the includes after the excludes. i copy/pasted it, moved the includes up, and ran it manually and it worked as expected.
Original rsync call from BIT logs (causing the problem)
from backintime log (line breaks added for clarity):
[I] rsync --recursive --times --devices --specials --hard-links --human-readable -s --links --perms --executability --group --owner --info=progress2 --no-inc-recursive --delete --delete-excluded -v -i --out-format=BACKINTIME: %i %n%L --link-dest=../../20230405-165753-308/backup --chmod=Du+wx
--exclude=/media/xin/bigbox-6-A/xin
--exclude=/home/xin/.local/share/backintime
--exclude=.local/share/backintime/mnt
--include=/home/xin/
--include=/home/
--include=/home/xin/.fonts/
--include=/home/xin/.ssh/
--exclude=.gvfs
--exclude=.cache/*
--exclude=.thumbnails*
--exclude=.local/share/[Tt]rash*
--exclude=*.backup*
--exclude=*~
--exclude=.dropbox*
--exclude=/proc/*
--exclude=/sys/*
--exclude=/dev/*
--exclude=/run/*
--exclude=/etc/mtab
--exclude=/var/cache/apt/archives/*.deb
--exclude=lost+found/*
--exclude=/tmp/*
--exclude=/var/tmp/*
--exclude=/var/backups/*
--exclude=.Private
--exclude=/home/xin/.*
--include=/home/xin/**
--include=/home/xin/.fonts/**
--include=/home/xin/.ssh/**
--include=/home/xin/.gitignore
--include=/home/xin/.gitconfig
--include=/home/xin/.bashrc
--exclude=*
/ /media/xin/bigbox-6-A/xin/backintime/offish/xin/1/new_snapshot/backup
Expected rsync call modified order of include/exclude (solving the problem)
modified that works as expected:
rsync --recursive --times --devices --specials --hard-links --human-readable -s --links --perms --executability --group --owner --info=progress2 --no-inc-recursive --delete --delete-excluded -v -i --out-format=BACKINTIME: %i %n%L --link-dest=../../20230405-165753-308/backup --chmod=Du+wx
--exclude=/media/xin/bigbox-6-A/xin
--exclude=/home/xin/.local/share/backintime
--exclude=.local/share/backintime/mnt
--include=/home/xin/
--include=/home/
--include=/home/xin/.fonts/
--include=/home/xin/.ssh/
--include=/home/xin/.gitignore
--include=/home/xin/.gitconfig
--include=/home/xin/.bashrc
--exclude=.gvfs
--exclude=.cache/*
--exclude=.thumbnails*
--exclude=.local/share/[Tt]rash*
--exclude=*.backup*
--exclude=*~
--exclude=.dropbox*
--exclude=/proc/*
--exclude=/sys/*
--exclude=/dev/*
--exclude=/run/*
--exclude=/etc/mtab
--exclude=/var/cache/apt/archives/*.deb
--exclude=lost+found/*
--exclude=/tmp/*
--exclude=/var/tmp/*
--exclude=/var/backups/*
--exclude=.Private
--exclude=/home/xin/.*
--include=/home/xin/**
--include=/home/xin/.fonts/**
--include=/home/xin/.ssh/**
--exclude=*
/ /media/xin/bigbox-6-A/xin/backintime/offish/xin/1/new_snapshot/backup
Please have a look at #1419
There is no concrete solution for your problems but further resources to read. The pattern matching rules of rsync are not easy.
EDIT: Took some time to understand your problem. We have to take a closer look into the related rsync rules. I assume that there is a solution for your case without modifying BITs behavior in ordering include/excludes.
EDIT2: This pattern matching with rsync always gives me a headache.
Hi, thanks for the response :-)
i found this article helpful too: https://zingbretsen.com/blog/rsync-include-exclude that's how i figured out how to edit the command and have it work when running manually.
i went down a bit of a rabbit hole last night, ran a bunch of rsync commands, and i'm pretty sure i have it figured out. sorry this a bit of an info dump.
i believe it is a simple internal fix and is half the solution to #561
Expected Behavior:
- included files should always be included in backup
- included folders should always be included in backup
- children of included folders should be included unless excluded
- excluded files should be excluded unless explicitly included
- excluded folders should be excluded unless explicitly included
- children of excluded folders should be excluded unless explicitly included
rsync: include & exclude
- once something is included ~~it is in unless excluded~~ it cannot be excluded. (edit)
- it's children will be included unless excluded, hence the universal exclude at the end. (edit)
- once something is excluded it cannot be included.
- for
rsyncto include something, it needs to "see" it. so all of the parent directories need to be included as well.
The order of two or more include clauses doesn't seem to matter. Similarly for two or more exclude clauses. But the relative order of include and exclude is really important.
backintime
- includes are input and stored as files or folders and are handled differently by the
rsyncInclude()method insnapshots.pyrsyncInclude()outputs two sets for the first and second include chunks (see below)
- excludes are input by pattern, file, or folder. all are stored as strings and have no special handling by the
rsyncExclude()method.
The rsyncSuffix() method builds the include/exclude part of the rsync command run for snapshots. This is grouped into 5 alternating chunks:
- exclude backup location and BiT config files
- include1: currently: folders and parent folders (should include files)
- exclude the exclude list
- include2: currently: files and folder children,
/path/to/folder/**(should only be folder children) - universal exclude,
--exclude=*
Because BiT isn't adding the included files until after the exclude rules, they cannot be added. They need to be added to the first include chunk.
This can easily be done by changing line 2012 from:
items2.add('--include={}'.format(folder))
to:
items1.add('--include={}'.format(folder))
Thanks a lot for analyzing and breaking this down. This will help a lot.
I'm not well into rsync behavior. And I'm always scared that modifying this will influence BITs default behavior and break someones other backups.
Because of that I'm very pedantic about a fix. I would like to dive deep into that rabit hole and understand all the details of how rsync behavior. That is how I can be "sure" about no one else is harmed by such a fix. It will take time I don't have currently. There are three other maintainers in the team but their time is also limited.
But I add the high priority label and the next-release-milestone to that Issue.
Thank you. Understandable. There's never enough time.
Workaround
i found a workaround for now:
add the include parameters into "additional options" under the "Expert Options" tab in settings.
this is inserted in the rsyncPrefix (defined in common/tools.py) so it is inserted before any other inlcude or exclude arguments.
make sure it doesn't match your backup location, ~/.local/share/backintime, or .local/share/backintime/mnt as that will probably cause problems. those are probably excluded first for a reason.
rsyncPrefix is used by snapshots.py for restore, backupConfig, and takeSnapshot and by sshtools.py for checkRemoteCommands. there doesn't seem to be any problems for restore and backupConfig. haven't tested the ssh stuff because i don't have that as part of my set up.
rsyncSuffix is only used by takeSnapshot.
I have recently spent a little time on this subject of resolving the logic of the Include and Exclude configuration. I have what I think is a correct solution with a more easily followed logic, and I'm writing some description and explanation before I push it to a branch of my public repo for consideration. It solves this issue #1420 and #561 and meets all of the expectations identified by @insubstudios above.
I very much agree with the concern about quietly sabotaging someone's backup by subtly changing the logic of how their configuration is applied. As one thing to help with that, the branch (not yet pushed) includes unit tests where we can easily add more test cases of whatever combinations of includes, excludes, and subject files and try them out with both the current and proposed strategy.
I've pushed the branch that I mentioned above: https://github.com/DerekVeit/backintime/tree/path_selections
With the exception of not yet trying to handle EncFS, it is working (I believe), but I don't consider it PR-ready. Given the significance of the change, it needs to be looked at more critically than usual, especially since I'm a new contributor. Some parts of it are maybe not for inclusion, e.g. a somewhat ad hoc log function for the testing code. So I'm just hoping for some review and feedback.
The main change is at Snapshots.rsyncSuffix to change how the --include= and --exclude= options get added to the rsync call. The previous/existing code adds them in a certain order that only works most of the time. The new strategy sorts the paths of the included and excluded paths in such a way that more-specific path specifications always have priorty.
By following a strategy that more-specific specifications should prevail, I believe this should not only fix #561 and #1420 but also be more reliably correct in general.
There are integration-level tests (test_rsync_selections.py) that check various cases, including for #561 and #1420. They call the Snapshots.backup method. For now, the revised rsyncSuffix still uses the original strategy unless a SELECTIONS_MODE attribute of the Config object exists and is set to "sorted". The tests use this to test cases with both strategies, original and sorted.
To make it easy to test various cases of existing file structures and include/exclude configurations, there is a test/filetree.py module for describing file structures with a string, e.g.:
"""
var/
log/
dmesg
syslog
"""
And the test cases are specified in a text format that uses those strings, e.g. in test/selection_cases. This is to make it easy to add more cases and also to make it very clear what each case is testing.
:exclude-name
includes
/var
excludes
log
files_tree
var/
lib/
foo
log
log/
dmesg
syslog
expected_tree
var/
lib/
foo
The tests use pytest for parametrize and fixtures, and there are fixtures defined in test/conftest.py for the Config object and the Snapshots object.
For some verbose logging from the test functions there is a simple log function in test/logging.py.
The new code does not yet include handling the EncFS feature. This might be as simple as adding encode = self.config.ENCODE and applying that in the right places as in rsyncInclude and rsyncExclude, but I haven't looked at that closely enough to see what I need for testing it, so I have left it alone so far. In this respect, the new code is definitely not ready. (I know the consideration of removing EncFS is only a maybe-sometime idea for now.)
I'm interested in BiT maintainers' thoughts, questions, and opinions.
SELECTION_MODE is probably only for temporary development testing purposes and can be eliminated, and rsyncSuffix might be simplified. The note I put in the docstring might be addressed or removed.
Although pytest is the default test runner for Back In Time, I know the existing tests are based on unittest. I like pytest, but consistency might be better. I could revise these tests to use generic.TestCase instead. I would then want to add the ddt library to provide the parameterizing.
The pathSelections method is more sprawling than I would like.
The log function is not essential to anything. I find it useful to have easy, verbose logging from my tests, especially while working on them, but it can be removed.
Hello Derek, thank you very much for your investment and contribution. Don't be shy. 😄 I would suggest that you open a PR, add your explanation in its comment field and mark it as "Draft". As a PR it is easier for us to organize. As an Issue it would drown between all the other issues.