backup: Add options --set-path and --set-path-from
What does this PR change? What problem does it solve?
Adds an option --set-path to backup which allows to manually set the path(s) saved in the snapshot and used for finding the parent snapshot. Also the option --set-path-from is added to read the paths from a file.
Both options are useful e.g. if the files to backup are selected by an external tool.
As set-path functionally replaces --stdin-filename, the latter is now marked as deprecated.
Was the change discussed in an issue or in the forum before?
closes #2714
closes #3198
allows users to use an easy workaround for #1514 by using --files-from-raw in combination with fd (or similar find tools) and --set-path
maybe also closes #2246
closes #1376
closes #2092
Checklist
- [x] I have read the Contribution Guidelines
- [x] I have enabled maintainer edits for this PR
- [ ] I have added tests for all changes in this PR
- [x] I have added documentation for the changes (in the manual)
- [x] There's a new file in
changelog/unreleased/that describes the changes for our users (template here) - [x] I have run
gofmton the code in all commits - [x] All commit messages are formatted in the same style as the other commits in the repo
- [x] I'm done, this Pull Request is ready for review
There has already been some discussion on such an option in #2714.
There has already been some discussion on such an option in #2714.
Thanks for the hint! I added this to in the description and adapted the changelog file.
I added some hints in the docu and realized that an extra options --set-paths-from would be also quite handy. This is now also added.
Also added some checks for the new flags in combination with stdin, analog to the checks of --files-from*.
About --stdin, we might think of making --stdin-filename an alias of --set-path and mark it deprecated. Then, for --stdin the only check we need is that only one path is given.
Should I add this to this PR?
Thanks a lot for this ! I was playing with this PR today and although --set-path works as expected, I'm see some inconsistencies compared to the normal backup command. This was always like that, but I don't know if this is expected also after this PR. I'll detail here what I'm doing:
- I do a normal initial backup:
restic_dev -r local:/tmp/test backup /opt
- I add a new file
/opt/this_is_a_testand I add this path in alist.txtfile which I source to restic using--files-from. I've also pass--set-path /opt. (Tried with and without the--parentflag but the result is the same)
restic_dev -r local:/tmp/test backup --files-from /list.txt --set-path /opt --parent 5976257d
- The backup ends correctly, and in the snapshots view I see both snapshots with the same path, so this is OK.
ID Time Host Tags Paths
---------------------------------------------------------------------
5976257d 2021-01-15 16:29:35 cbox-restic.cern.ch /opt
e5fa3044 2021-01-15 16:38:56 cbox-restic.cern.ch /opt
---------------------------------------------------------------------
- Now, if I mount the repo and traverse to snapshots/latests I see only the file added in the last snapshot:
[16:39][root@cbox-restic (qa:box/restic/dev) restic]# cd snapshots/latest/
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# ls
opt
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# cd opt/
[16:39][root@cbox-restic (qa:box/restic/dev) opt]# ls -la
total 1
drwxr-xr-x. 2 root root 0 Jan 15 16:31 .
dr-xr-xr-x. 2 root root 0 Jan 15 16:38 ..
-rw-r--r--. 1 root root 13 Jan 15 16:31 this_is_a_test
- If I try restore the snapshot, only that file is recovered.
- If instead of doing the combination of
--files-from,--set-pathand--parentI trigger a normal backup after adding thethis_is_a_testfile, the contents ofsnapshots/latestsI see the full /opt view with the recently added file.
I think that we should expect the same final state either doing it with normal backup or using the combination of --files-from and --set-path, what do you think?
Roberto.
@robvalca Thanks for your feedback!
Is in your /list.txt only the line /opt/this_is_a_test or are there also the other files?
As you write "This was always like this", I guess so. In this case, yes then the new snapshot should only contain this one file.
Note that in this case, there is no sense in using a parent as there are no "old" files which should be in the new snapshot.
Hi @aawsome , thanks for the quick reply! I only have that single path in the txt. I thought that there would be a merge with the files on the list and the previous snapshot of the path specified by --set-path. In any case, with this PR the internal view of the snapshots makes more sense. I'm looking forward to have this merged. Thanks!
I very much look forward to seeing this merged!
What would be extra useful would be the ability to also change the path of existing snapshots so as not to have to do a full backup of everything after this option is in use.
My use case is backing up a series of backups created with dirvish (for various reasons we can't use restic to directly back up the hosts), so they all end up in /var/backups/dirvish/
Another use case not described above is when using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source. Using the filesystem snapshot makes it much more likely to have a consistent backup of things like database files.
I also envision using the --set-path feature to write a utility that can take a list of local filesystem snapshots and ensure that they are all replicated in restic (including correct creation timestamps and parent relationship) as that would allow simple interfacing of an existing local backup system with its creation and expiry schedules and the more flexible off-site restic backup.
@amuckart I agree that changing the path would be a useful feature. But maybe that could be left for another PR (so this one doesn't get held up)?
I would also like to leave the scope of this PR as it is. Can we open a new issue to discuss this changing of trees in the repo? I think it's not just changing the tree names (which should be pretty simple) but the syntax how to specify which path maps to which tree. I can imagine that there are many pitfalls which should be discussed first.
:+1: for this proposal.
My use-case is for a selective backup that I cannot perform with an exclusion-list :
- >100.000 files to exclude and growing, even with optimization done in 0.11.0 and 0.12.0, scanning file takes 2h30
- no general glob / regexp pattern applicable to reduce exclusion list
With an inclusion list (files-from) :
- I don't detect myself the incorrect parent lookup behavior ; but I encounter a x2 time for my first snapshot with files-from
- I detect that snapshot list would be an issue, with the growing paths attribute
I read the patch and it seems sane, but I surely does not have enough experience on restic codebase to give an informed opinion.
Is there anything I can do to help ? (I'm building and trying this patch on a working repository ; I'll give some feedback on this).
Working with my repository:
- parent snapshot is correctly detected; cpu and io seems to confirm that useless re-read are not done
- backup elapsed time is consistent with timings on a similar tree that does not need include/exclude tricks
- snapshots list display the expected paths
- backup restore give me back the expected files
using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source
@hamishcoleman This is exactly what I was trying to do yesterday and realized it didn't exist. Anxiously awaiting the approval of this PR. I'll probably build 0.12.0 from source with this patch applied just so I can use this functionality!
This is going to make backing up ZFS snapshots clean.
@lpulley I have worked around it for the moment by making my backup script take a temporary snap of the real snap but using a single stable name - then destroying the temporary snap after restic has backed things up - which is working well.
Is this just awaiting review?
Is this just awaiting review?
From my side: yes
I wanted to use this option to alter the snapshot paths and use the same paths between platforms (win/unix), but this patch keeps transforming the paths into absolute.
restic init --repo repo
restic backup --repo repo --set-path /custom-data data
restic snapshots --repo repo
ID Time Host Tags Paths
---------------------------------------------------------------------------------------
e69de59e 2021-06-03 18:16:43 ????????? C:\Program Files\Git\custom-data
---------------------------------------------------------------------------------------
1 snapshots
You can bypass this feature specifying the driver letter (ex: X:\custom-path), but this workaround is not cross-platform.
Hello,
What's preventing this pull request from being merged ?
Thanks
I wanted to use this option to alter the snapshot paths and use the same paths between platforms (win/unix), but this patch keeps transforming the paths into absolute.
You can bypass this feature specifying the driver letter (ex: X:\custom-path), but this workaround is not cross-platform.
You are right. In the snapshots, only absolute paths are saved. The code that turns a path into an absolute one is platform-dependent and therefore prevents you from using this PR to simulate on Windows that the snapshot was created under Linux.
The problem is, if we omit this, then the users always have to specify absolute paths with this new option. Should I change the behavior? Opinions?
I have a similar situation where the data was backed up via ssh so far and now I want to use sshfs so the path changes but the data is all the same. I think it'd be nice if the path was exactly as specified in the --set-path option.
It seems that to skip the path conversion with --set-path is not sufficient because the conversion is used at FindLatestSnapshot too, and that function is used in the next commands: backup, copy, dump, find, forget, ls, recover, restore, snapshots, stats and tag.
Implementing --set-path in all those commands would not be correct.
Maybe a new option (--raw-paths) cover all this needs so... new PR?
https://github.com/juanrgm/restic/compare/backup-set-path...backup-raw-paths
@juanrgm It does seem like there are already a few issues in the backlog about how paths are handled. And while I would love to see restic support for relative paths, I completely agree it's out of scope for this PR.
In my opinion, --set-path should resolve a path the same way the rest of restic does so. And it sounds like that's already what you're doing here. Am I reading that correctly?
@juanrgm It does seem like there are already a few issues in the backlog about how paths are handled. And while I would love to see restic support for relative paths, I completely agree it's out of scope for this PR.
I will create formally a PR for this issue when I have time.
In my opinion,
--set-pathshould resolve a path the same way the rest of restic does so. And it sounds like that's already what you're doing here. Am I reading that correctly?
Yep, @aawsome added --set-path with the same behaviour than rest of restic.
It seems like this PR should be ready, then, doesn't it?
I'm not familiar with the process here at restic. Does someone need to be notified for a review or anything like that?
@wren There is no further action needed, this PR is in the queue along with the others. It needs review and consideration, which will happen with time.
For anyone wanting to try out this PR's feature, it's very simple to check out the code in this PR's branch and build restic by go run build.go. Give it a spin if you want :)
How does this address #2092?
@aawsome I'm thinking it would be nice to use the same form of the noun path in both arguments, e.g. --set-path and --set-path-from, considering that (according to your first sentence) both of them take one or more path(s).
For example, there's not really anything suggesting that just because one feeds path(s) from a file it's going to be multiple paths - might as well be one single path generated by some external software :) Also, it would be less confusing and more coherent.
What do you think, does that make sense?
@aawsome I'm thinking it would be nice to use the same form of the noun
pathin both arguments, e.g.--set-pathand--set-path-from, considering that (according to your first sentence) both of them take one or more path(s).
Fully agree! Thanks for the suggestion :+1: . I'll change it immediately.