borg icon indicating copy to clipboard operation
borg copied to clipboard

Multiple relative paths for `borg create`

Open finefoot opened this issue 5 years ago • 14 comments

Docs at https://borgbackup.readthedocs.io/en/stable/usage/create.html say:

Paths are added to the archive as they are given, that means if relative paths are desired, the command has to be run from the correct directory.

And further down, there's an example:

# Backing up relative paths by moving into the correct directory first
$ cd /home/user/Documents
# The root directory of the archive will be "projectA"
$ borg create /path/to/repo::daily-projectA-{now:%Y-%m-%d} projectA

Now, if there is a second folder projectB in the same directory, that wouldn't be a problem. But is it possible to add two relative paths projectA and projectB if they're in different directories like /home/userA/projectA and /home/userB/projectB?

  • I can't be in both /home/userA and /home/userB at the same time, so I can't just add both projectA and projectB.
  • Also, I don't think it's a possibility to first cd to projectA, run borg create, cd to projectB and borg create there into the same archive again, right?
  • I might be able to use symlinks? I will have to look further into that.

finefoot avatar Jul 15 '19 23:07 finefoot

No, that is not possible (well, maybe you could use bind-mounts or something to artificially create the structure you want). symlinks won't helps as they are archived as symlinks (not followed).

But this might not be desirable anyway, as it is confusing. What you could do is to go up the fs tree structure until your reach a common parent dir or just directly go to / and work relative to there (and add excludes for what you do not want).

ThomasWaldmann avatar Jul 17 '19 15:07 ThomasWaldmann

But this might not be desirable anyway, as it is confusing.

Why? :thinking: Am I overlooking something? Why would it be necessary for the backup to reflect the original file system directory structure if it's only empty parent folders leading up to projectB?

maybe you could use bind-mounts

That worked perfectly. Thanks! :+1:

finefoot avatar Jul 18 '19 18:07 finefoot

Well, it is helpful if the backup archive has helpful pathes. If you'ld just pick a lot of stuff from misc locations and put it into an archive (without some leading path), you might get into trouble at restore time, trying to remember what was where.

ThomasWaldmann avatar Jul 20 '19 19:07 ThomasWaldmann

I think it would help in order to make long-term backup archives more system-independent and more robust to changes.

With TAR archives, we have the -C option to change directories right before backing up a folder, so you can do: tar czvf /tmp/test.tgz -C /home/userA projectA -C /home/userB projectB

And then restore using: tar xzvf /tmp/test.tgz -C /home/userA projectA -C /home/userB projectB

If I later change the locations of these projects, I can simply update my backup/restore script and still restore old archives, because they didn't include any leading paths to begin with. Path structure is usually extremely important for projects, but not the leading path where that project was located at the time.

Other workarounds might include:

  • Using TAR without compression for backups and pipe into Borg, probably making restores more cumbersome (bad idea?)
  • Using multiple repositories (one for each project)
  • Using a single repository but backing up into separate archives using a prefix (also interesting for pruning using --prefix)
  • Using a single repository and single archive, but include a locations.txt file in the backup which contains the entry-points of backed up project folders and then strip-out the leading paths at restore time

I would love an option comparable to TAR but, of course, the workarounds do work. I think Borg is the holy grail of Linux backups and nearly as perfect as it can be. Thanks for giving us such a wonderful tool!

int3code avatar Dec 16 '21 13:12 int3code

That sounds like a reasonable suggestion, so I am reopening this so we can check whether this can be implemented.

ThomasWaldmann avatar Dec 23 '21 18:12 ThomasWaldmann

It looks like we can't implement tar syntax and behaviour here (like -C /home/userA projectA -C /home/userB projectB).

Reasons:

  • blocker: argparse parse_args does not support intermixed options and positional args. there is parse_intermixed_args, but it does not support other argparse stuff we use.
  • because of this, it could be only like borg create --cwd /home/userA --cwd /home/userB repo::archive projectA projectB, which is rather ugly and limited.
  • trivial: we already use -C for compression (that would not hold us back from just using --cwd or so though)

Of course one could do misc. special hacks, like first using parse_args with argparse.REMAINDER to catch -C /home/userA projectA -C /home/userB projectB and then use another parse_intermixed_args within do_create, but that would collide with our nicely automated help, manpage and docs generation.

ThomasWaldmann avatar Jan 07 '22 20:01 ThomasWaldmann

A hack to implement the same idea could be to use some special separator within the PATHs, like:

borg create REPO::ARCHIVE /home/userA::projectA /home/userB::projectB

Presence of :: inside a path would split it into CWD and PATH and then act accordingly.

Problematic if you have an actual recursion root path that contains ::.

Also, users might get confused due to the different meanings of this separator when used for REPO/ARCHIVE and for CWD/PATH.

ThomasWaldmann avatar Jan 07 '22 20:01 ThomasWaldmann

Yeah, I've always wondered why some tools allow appending options despite the syntax saying options before arguments.

I find the idea of a separator intriguing. Maybe let the user specify the separator using a special option? borg create --path-prefix-separator % REPO::ARCHIVE /home/userA%projectA /home/userB%projectB

So you can shift responsibility for edge cases to the user, similar to HEREDOC. And only users who want to use the feature need to be aware of the implications.

In that case, it may be worthwhile to ditch the CD aspect. Tar makes it easy to understand because it's the same as manually typing CD between commands. But that also allows you to do weird relative/contextual stuff like -C /home/userA projectA -C ../userB projectB. (confusing, breaks when reordering operations).

Keeping a fixed current directory and then stripping path components dynamically (using string operations) would probably solve it just as well and also make it more akin to (but more flexible than) borg extract --strip-components NUM. And the help wouldn't need to cover both the separator and CDing between paths (two concepts vs. the one in Tar).

int3code avatar Jan 07 '22 23:01 int3code

I would also appreciate this for "prettier" archive structures. Here's another possible implementation, taken from nothing less than man rsync:

It is also possible to limit the amount of path information that is sent as implied directories for each path you specify. With a modern rsync on the sending side (beginning with 2.6.7), you can insert a dot and a slash into the source path, like this:

rsync -avR /foo/./bar/baz.c remote:/tmp/

That would create /tmp/bar/baz.c on the remote machine. (Note that the dot must be followed by a slash, so "/foo/." would not be abbreviated.)

That "modern" rsync 2.6.7 was released in 2006 by the way, here's the original commit: https://github.com/WayneD/rsync/commit/d2ea5980ba7986ddd583b4f55737eb56a0ed66a6

rovo89 avatar Jan 22 '24 13:01 rovo89

@rovo89 Oh, that's an interesting, slightly dirty hack. :-)

It's somehow similar to my idea back then: https://github.com/borgbackup/borg/issues/4685#issuecomment-1007716496

But, as it uses the unusual, but "NOP" ./ as the separator, it does usually not have the problem that this accidentally appears in a path (although there could be weird circumstances where it does, e.g. because the user gave it that way without wanting to trigger this functionality / without realising that it triggers this functionality).

ThomasWaldmann avatar Jan 22 '24 13:01 ThomasWaldmann

@ThomasWaldmann Oh wow, I didn't realize that you already implemented this. Thanks! 🙂 I added a comment (https://github.com/ThomasWaldmann/borg/commit/5b96d5acc30fec766a076ec367a154497b5d52e4#r138269214) regarding the edge case with multiple /./ occurrences.

Also, are there any plans to port this to 2.0 as well? Sorry if it's a dumb question, I briefly looked for a statement about intended feature parity but couldn't find any.

rovo89 avatar Feb 05 '24 10:02 rovo89

@rovo89 yes, i tagged #8060 with port/master. just want to first collect all feedback, see also the 1.4 thread on github discussions.

ThomasWaldmann avatar Feb 05 '24 12:02 ThomasWaldmann

Cool, thanks! I didn't notice either, so thanks for the pointers.

rovo89 avatar Feb 05 '24 13:02 rovo89

OK, I'll summarise the ideas here (I always used /./ as separator here):

  • SP "strip prefix" /strip/prefix/./keep/postfix - implemented by #8060
  • SM "strip in the middle" /keep/prefix/./strip/this/./keep/postfix
  • RP "replace prefix" /find/this/prefix/./replace/that/prefix/./keep/postfix

SP

The nice thing here is that the given path is precisely the source path we want to back up, except that it has that NOP /./ inside (which just vanishes after the normalisation we do anyway).

SM

This is a bit more powerful than SP, because one may keep a leading part of the path.

The given path is precisely the source path, except that is has 2 NOPs inside, which vanish after normalisation.

RP

This is the most powerful idea, because it can translate the fs path prefix to a different archive path prefix. But we need to do more processing there, not just normalisation.

Instead of the syntax given above we could also use the more friendly, but also more risky syntax /replace/that/prefix/:/find/this/prefix/./keep/postfix. When splitting at the : separator, the rhs would be precisely a source path (including the to be found prefix), the lhs would be the replacement prefix.

It's more risky because : is a valid character in UNIX filenames. We could also use :: to reduce bad detection of it.

RP is a superset of SP if we consider the replacement part (and separator) optional, defaulting to the empty string as replacement.

RP is also a superset of SM, because we can find a longer path prefix and replace it with a reduced version of it.

ThomasWaldmann avatar Feb 05 '24 19:02 ThomasWaldmann