restic icon indicating copy to clipboard operation
restic copied to clipboard

backup: Add options --set-path and --set-path-from

Open aawsome opened this issue 5 years ago • 37 comments

What does this PR change? What problem does it solve?

Adds an option --set-path to backup which allows to manually set the path(s) saved in the snapshot and used for finding the parent snapshot. Also the option --set-path-from is added to read the paths from a file.

Both options are useful e.g. if the files to backup are selected by an external tool.

As set-path functionally replaces --stdin-filename, the latter is now marked as deprecated.

Was the change discussed in an issue or in the forum before?

closes #2714 closes #3198 allows users to use an easy workaround for #1514 by using --files-from-raw in combination with fd (or similar find tools) and --set-path maybe also closes #2246 closes #1376 closes #2092

Checklist

  • [x] I have read the Contribution Guidelines
  • [x] I have enabled maintainer edits for this PR
  • [ ] I have added tests for all changes in this PR
  • [x] I have added documentation for the changes (in the manual)
  • [x] There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • [x] I have run gofmt on the code in all commits
  • [x] All commit messages are formatted in the same style as the other commits in the repo
  • [x] I'm done, this Pull Request is ready for review

aawsome avatar Dec 29 '20 19:12 aawsome

There has already been some discussion on such an option in #2714.

MichaelEischer avatar Dec 29 '20 21:12 MichaelEischer

There has already been some discussion on such an option in #2714.

Thanks for the hint! I added this to in the description and adapted the changelog file.

aawsome avatar Dec 29 '20 22:12 aawsome

I added some hints in the docu and realized that an extra options --set-paths-from would be also quite handy. This is now also added.

aawsome avatar Jan 01 '21 12:01 aawsome

Also added some checks for the new flags in combination with stdin, analog to the checks of --files-from*.

aawsome avatar Jan 01 '21 12:01 aawsome

About --stdin, we might think of making --stdin-filename an alias of --set-path and mark it deprecated. Then, for --stdin the only check we need is that only one path is given. Should I add this to this PR?

aawsome avatar Jan 01 '21 14:01 aawsome

Thanks a lot for this ! I was playing with this PR today and although --set-path works as expected, I'm see some inconsistencies compared to the normal backup command. This was always like that, but I don't know if this is expected also after this PR. I'll detail here what I'm doing:

  1. I do a normal initial backup:
restic_dev -r local:/tmp/test backup /opt
  1. I add a new file /opt/this_is_a_test and I add this path in a list.txt file which I source to restic using --files-from. I've also pass --set-path /opt. (Tried with and without the --parent flag but the result is the same)
restic_dev -r local:/tmp/test backup --files-from /list.txt  --set-path /opt --parent 5976257d
  1. The backup ends correctly, and in the snapshots view I see both snapshots with the same path, so this is OK.
ID        Time                 Host                 Tags        Paths
---------------------------------------------------------------------
5976257d  2021-01-15 16:29:35  cbox-restic.cern.ch              /opt
e5fa3044  2021-01-15 16:38:56  cbox-restic.cern.ch              /opt
---------------------------------------------------------------------
  1. Now, if I mount the repo and traverse to snapshots/latests I see only the file added in the last snapshot:
[16:39][root@cbox-restic (qa:box/restic/dev) restic]# cd snapshots/latest/
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# ls
opt
[16:39][root@cbox-restic (qa:box/restic/dev) latest]# cd opt/
[16:39][root@cbox-restic (qa:box/restic/dev) opt]# ls -la
total 1
drwxr-xr-x. 2 root root  0 Jan 15 16:31 .
dr-xr-xr-x. 2 root root  0 Jan 15 16:38 ..
-rw-r--r--. 1 root root 13 Jan 15 16:31 this_is_a_test
  1. If I try restore the snapshot, only that file is recovered.
  2. If instead of doing the combination of --files-from,--set-path and --parent I trigger a normal backup after adding the this_is_a_test file, the contents of snapshots/latests I see the full /opt view with the recently added file.

I think that we should expect the same final state either doing it with normal backup or using the combination of --files-from and --set-path, what do you think?

Roberto.

robvalca avatar Jan 15 '21 16:01 robvalca

@robvalca Thanks for your feedback! Is in your /list.txt only the line /opt/this_is_a_test or are there also the other files? As you write "This was always like this", I guess so. In this case, yes then the new snapshot should only contain this one file. Note that in this case, there is no sense in using a parent as there are no "old" files which should be in the new snapshot.

aawsome avatar Jan 15 '21 16:01 aawsome

Hi @aawsome , thanks for the quick reply! I only have that single path in the txt. I thought that there would be a merge with the files on the list and the previous snapshot of the path specified by --set-path. In any case, with this PR the internal view of the snapshots makes more sense. I'm looking forward to have this merged. Thanks!

robvalca avatar Jan 15 '21 16:01 robvalca

I very much look forward to seeing this merged!

What would be extra useful would be the ability to also change the path of existing snapshots so as not to have to do a full backup of everything after this option is in use.

My use case is backing up a series of backups created with dirvish (for various reasons we can't use restic to directly back up the hosts), so they all end up in /var/backups/dirvish///tree. I'm making the path consistent between backups by bind mounting the .../tree directory but this feature would completely remove the need for that complexity and allow me to directly back up the directory without having to bind mount it, as well as making restoration on to the server a whole lot easier, but then the path would be inconsistent with all of the existing months of backups I've got in the restic repository.

amuckart avatar Feb 24 '21 05:02 amuckart

Another use case not described above is when using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source. Using the filesystem snapshot makes it much more likely to have a consistent backup of things like database files.

I also envision using the --set-path feature to write a utility that can take a list of local filesystem snapshots and ensure that they are all replicated in restic (including correct creation timestamps and parent relationship) as that would allow simple interfacing of an existing local backup system with its creation and expiry schedules and the more flexible off-site restic backup.

hamishcoleman avatar Feb 25 '21 19:02 hamishcoleman

@amuckart I agree that changing the path would be a useful feature. But maybe that could be left for another PR (so this one doesn't get held up)?

wren avatar Mar 02 '21 00:03 wren

I would also like to leave the scope of this PR as it is. Can we open a new issue to discuss this changing of trees in the repo? I think it's not just changing the tree names (which should be pretty simple) but the syntax how to specify which path maps to which tree. I can imagine that there are many pitfalls which should be discussed first.

aawsome avatar Mar 04 '21 09:03 aawsome

:+1: for this proposal.

My use-case is for a selective backup that I cannot perform with an exclusion-list :

  • >100.000 files to exclude and growing, even with optimization done in 0.11.0 and 0.12.0, scanning file takes 2h30
  • no general glob / regexp pattern applicable to reduce exclusion list

With an inclusion list (files-from) :

  • I don't detect myself the incorrect parent lookup behavior ; but I encounter a x2 time for my first snapshot with files-from
  • I detect that snapshot list would be an issue, with the growing paths attribute

I read the patch and it seems sane, but I surely does not have enough experience on restic codebase to give an informed opinion.

Is there anything I can do to help ? (I'm building and trying this patch on a working repository ; I'll give some feedback on this).

lalmeras avatar Mar 24 '21 11:03 lalmeras

Working with my repository:

  • parent snapshot is correctly detected; cpu and io seems to confirm that useless re-read are not done
  • backup elapsed time is consistent with timings on a similar tree that does not need include/exclude tricks
  • snapshots list display the expected paths
  • backup restore give me back the expected files

lalmeras avatar Mar 24 '21 23:03 lalmeras

using filesystem snapshots (eg, with btrfs or zfs) as the source for the restic backup but wanting the restic backup path to clearly represent the actual data source

@hamishcoleman This is exactly what I was trying to do yesterday and realized it didn't exist. Anxiously awaiting the approval of this PR. I'll probably build 0.12.0 from source with this patch applied just so I can use this functionality!

This is going to make backing up ZFS snapshots clean.

lpulley avatar Mar 25 '21 19:03 lpulley

@lpulley I have worked around it for the moment by making my backup script take a temporary snap of the real snap but using a single stable name - then destroying the temporary snap after restic has backed things up - which is working well.

hamishcoleman avatar Mar 25 '21 20:03 hamishcoleman

Is this just awaiting review?

lpulley avatar Apr 27 '21 04:04 lpulley

Is this just awaiting review?

From my side: yes

aawsome avatar Apr 28 '21 04:04 aawsome

I wanted to use this option to alter the snapshot paths and use the same paths between platforms (win/unix), but this patch keeps transforming the paths into absolute.

restic init --repo repo
restic backup --repo repo --set-path /custom-data data
restic snapshots --repo repo
ID        Time                 Host        Tags        Paths
---------------------------------------------------------------------------------------
e69de59e  2021-06-03 18:16:43  ?????????               C:\Program Files\Git\custom-data
---------------------------------------------------------------------------------------
1 snapshots

You can bypass this feature specifying the driver letter (ex: X:\custom-path), but this workaround is not cross-platform.

juanrgm avatar Jun 03 '21 16:06 juanrgm

Hello,

What's preventing this pull request from being merged ?

Thanks

sydney-d avatar Jun 10 '21 07:06 sydney-d

I wanted to use this option to alter the snapshot paths and use the same paths between platforms (win/unix), but this patch keeps transforming the paths into absolute.

You can bypass this feature specifying the driver letter (ex: X:\custom-path), but this workaround is not cross-platform.

You are right. In the snapshots, only absolute paths are saved. The code that turns a path into an absolute one is platform-dependent and therefore prevents you from using this PR to simulate on Windows that the snapshot was created under Linux.

The problem is, if we omit this, then the users always have to specify absolute paths with this new option. Should I change the behavior? Opinions?

aawsome avatar Jun 10 '21 11:06 aawsome

I have a similar situation where the data was backed up via ssh so far and now I want to use sshfs so the path changes but the data is all the same. I think it'd be nice if the path was exactly as specified in the --set-path option.

nicnab avatar Jun 10 '21 11:06 nicnab

It seems that to skip the path conversion with --set-path is not sufficient because the conversion is used at FindLatestSnapshot too, and that function is used in the next commands: backup, copy, dump, find, forget, ls, recover, restore, snapshots, stats and tag.

Implementing --set-path in all those commands would not be correct.

Maybe a new option (--raw-paths) cover all this needs so... new PR?

https://github.com/juanrgm/restic/compare/backup-set-path...backup-raw-paths

juanrgm avatar Jun 25 '21 12:06 juanrgm

@juanrgm It does seem like there are already a few issues in the backlog about how paths are handled. And while I would love to see restic support for relative paths, I completely agree it's out of scope for this PR.

In my opinion, --set-path should resolve a path the same way the rest of restic does so. And it sounds like that's already what you're doing here. Am I reading that correctly?

wren avatar Jun 29 '21 01:06 wren

@juanrgm It does seem like there are already a few issues in the backlog about how paths are handled. And while I would love to see restic support for relative paths, I completely agree it's out of scope for this PR.

I will create formally a PR for this issue when I have time.

In my opinion, --set-path should resolve a path the same way the rest of restic does so. And it sounds like that's already what you're doing here. Am I reading that correctly?

Yep, @aawsome added --set-path with the same behaviour than rest of restic.

juanrgm avatar Jun 29 '21 13:06 juanrgm

It seems like this PR should be ready, then, doesn't it?

I'm not familiar with the process here at restic. Does someone need to be notified for a review or anything like that?

wren avatar Jul 02 '21 03:07 wren

@wren There is no further action needed, this PR is in the queue along with the others. It needs review and consideration, which will happen with time.

For anyone wanting to try out this PR's feature, it's very simple to check out the code in this PR's branch and build restic by go run build.go. Give it a spin if you want :)

rawtaz avatar Aug 29 '21 19:08 rawtaz

How does this address #2092?

haslersn avatar Sep 26 '21 11:09 haslersn

@aawsome I'm thinking it would be nice to use the same form of the noun path in both arguments, e.g. --set-path and --set-path-from, considering that (according to your first sentence) both of them take one or more path(s).

For example, there's not really anything suggesting that just because one feeds path(s) from a file it's going to be multiple paths - might as well be one single path generated by some external software :) Also, it would be less confusing and more coherent.

What do you think, does that make sense?

rawtaz avatar Jan 02 '22 22:01 rawtaz

@aawsome I'm thinking it would be nice to use the same form of the noun path in both arguments, e.g. --set-path and --set-path-from, considering that (according to your first sentence) both of them take one or more path(s).

Fully agree! Thanks for the suggestion :+1: . I'll change it immediately.

aawsome avatar Jan 16 '22 06:01 aawsome