borg extract: only newer/non-existing files
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Question
Describe the problem you're observing.
Is there a way to have 'borg extract' ignore already existing/extracted files ? This would be useful if the connection is lost during an extract, or if halfway a large restore you decide it's better to abort and add an additional --exclude mask.
I agree it would be useful to optimize such a case (and still making sure everything is correct in the end), but that's not implemented yet.
IIRC, there is a more generic ticket about this already, maybe you can find it.
The general case is extracting merging into some (arbitrary) existing directory tree and maybe even efficiently updating existing big files.
What you can try is borg mount and rsync from the mount to the target dir.
Not sure if it is really faster, guess it depends...
But be aware that borg mount does not support ACLs and (bsd / filesystem) flags.
IIRC, there is a more generic ticket about this already, maybe you can find it.
I am also interested in this feature, so I looked for related issues. Are you referring to #1986?
@xwst Yes, guess that was the one.
I'd like to ask for this feature as well. I've just experienced extraction abort in the middle of the ~10TB restore, which had already been running for two days, and having to re-do all of those hurts.
I'll look into mounting & rsyncing for now.
Apart from that: borg is an amazing piece of software, and I'm very, very grateful for all your work!
OK, maybe it is worth implementing the simplest usecase (not the most generic usecase of "i have something, bring it in sync"):
- we start from an empty extraction base directory D
- an extraction of some archive A is attempted, but interrupted
- nothing inside D is modified (especially: nothing added or renamed)
- the extraction attempt of A shall be efficiently repeated without re-extracting what we already have in D
- expectation: have a full, valid extraction of A, no more, no less
So we have these cases for some file Fa (in archive) and Ff (in filesystem):
- Ff is not present: extract Fa
- Ff is already present, but there is a mismatch in size or mtime compared to Fa: delete Ff, extract Fa
- Ff is already present, its size and mtime matches what we have in Fa: nothing to do
For some directory Da (in archive) and Df (in filesystem):
- Df is not present: extract Da (== create directory, set metadata). Note: if we write files into Df, we modify the timestamps of Df by doing that and need to update timestamps of Df again at the end.
- Df is already present, but there is a mismatch in mtime: update timestamps of Df again at the end
- Df is already present, its mtime matches Da: nothing to do
TODO: consider xattrs, acls and other metadata.
The current code restores metadata in this order:
- uid/gid
- mode
- atime/birthtime
- atime/mtime
- acls
- xattrs
- flags (includes immutable flag, thus must be done at the end)
Note: if metadata restoration gets interrupted somewhere after mtime, the fs item would have a "correct" (matching) mtime, but would not have complete acls or xattrs.
Thus, I guess this would need to change to:
- uid/gid
- mode
- acls
- xattrs
- atime/birthtime
- atime/mtime
- flags (includes immutable flag, thus must be done at the end)
That way (doing mtime as late as possible), having a matching mtime (archive vs. filesystem) would imply that metadata restoration was finished for that fs item (with a small remaining risk concerning the flags, which aren't used that much).
Comments?
would love to see this functionality, if someone has the time to handhold with me on how this should be implemented I am wiling to give it a shot, end goal for me is #1986
"mtime (2nd) last" was already implemented in master and 1.2-maint branches.
see archive.py -> restore_attrs.
@thebalaa if you want to help, just ask (e.g. on IRC) and open a PR.
borg extract --continue (master branch) does some of this. #1356