borg icon indicating copy to clipboard operation
borg copied to clipboard

borg extract: only newer/non-existing files

Open unilynx opened this issue 3 years ago • 4 comments

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Question

Describe the problem you're observing.

Is there a way to have 'borg extract' ignore already existing/extracted files ? This would be useful if the connection is lost during an extract, or if halfway a large restore you decide it's better to abort and add an additional --exclude mask.

unilynx avatar Jul 13 '22 11:07 unilynx

I agree it would be useful to optimize such a case (and still making sure everything is correct in the end), but that's not implemented yet.

IIRC, there is a more generic ticket about this already, maybe you can find it.

The general case is extracting merging into some (arbitrary) existing directory tree and maybe even efficiently updating existing big files.

ThomasWaldmann avatar Jul 13 '22 11:07 ThomasWaldmann

What you can try is borg mount and rsync from the mount to the target dir.

Not sure if it is really faster, guess it depends...

But be aware that borg mount does not support ACLs and (bsd / filesystem) flags.

ThomasWaldmann avatar Jul 13 '22 11:07 ThomasWaldmann

IIRC, there is a more generic ticket about this already, maybe you can find it.

I am also interested in this feature, so I looked for related issues. Are you referring to #1986?

xwst avatar Aug 10 '22 05:08 xwst

@xwst Yes, guess that was the one.

ThomasWaldmann avatar Aug 10 '22 10:08 ThomasWaldmann

I'd like to ask for this feature as well. I've just experienced extraction abort in the middle of the ~10TB restore, which had already been running for two days, and having to re-do all of those hurts.

I'll look into mounting & rsyncing for now.

Apart from that: borg is an amazing piece of software, and I'm very, very grateful for all your work!

mbunkus avatar Nov 23 '22 12:11 mbunkus

OK, maybe it is worth implementing the simplest usecase (not the most generic usecase of "i have something, bring it in sync"):

  • we start from an empty extraction base directory D
  • an extraction of some archive A is attempted, but interrupted
  • nothing inside D is modified (especially: nothing added or renamed)
  • the extraction attempt of A shall be efficiently repeated without re-extracting what we already have in D
  • expectation: have a full, valid extraction of A, no more, no less

So we have these cases for some file Fa (in archive) and Ff (in filesystem):

  • Ff is not present: extract Fa
  • Ff is already present, but there is a mismatch in size or mtime compared to Fa: delete Ff, extract Fa
  • Ff is already present, its size and mtime matches what we have in Fa: nothing to do

For some directory Da (in archive) and Df (in filesystem):

  • Df is not present: extract Da (== create directory, set metadata). Note: if we write files into Df, we modify the timestamps of Df by doing that and need to update timestamps of Df again at the end.
  • Df is already present, but there is a mismatch in mtime: update timestamps of Df again at the end
  • Df is already present, its mtime matches Da: nothing to do

TODO: consider xattrs, acls and other metadata.

ThomasWaldmann avatar Nov 23 '22 13:11 ThomasWaldmann

The current code restores metadata in this order:

  • uid/gid
  • mode
  • atime/birthtime
  • atime/mtime
  • acls
  • xattrs
  • flags (includes immutable flag, thus must be done at the end)

Note: if metadata restoration gets interrupted somewhere after mtime, the fs item would have a "correct" (matching) mtime, but would not have complete acls or xattrs.

Thus, I guess this would need to change to:

  • uid/gid
  • mode
  • acls
  • xattrs
  • atime/birthtime
  • atime/mtime
  • flags (includes immutable flag, thus must be done at the end)

That way (doing mtime as late as possible), having a matching mtime (archive vs. filesystem) would imply that metadata restoration was finished for that fs item (with a small remaining risk concerning the flags, which aren't used that much).

Comments?

ThomasWaldmann avatar Nov 24 '22 21:11 ThomasWaldmann

would love to see this functionality, if someone has the time to handhold with me on how this should be implemented I am wiling to give it a shot, end goal for me is #1986

thebalaa avatar Mar 03 '23 05:03 thebalaa

"mtime (2nd) last" was already implemented in master and 1.2-maint branches.

see archive.py -> restore_attrs.

@thebalaa if you want to help, just ask (e.g. on IRC) and open a PR.

ThomasWaldmann avatar Mar 03 '23 08:03 ThomasWaldmann

borg extract --continue (master branch) does some of this. #1356

ThomasWaldmann avatar Jun 08 '23 21:06 ThomasWaldmann