borg icon indicating copy to clipboard operation
borg copied to clipboard

Please allow "borg create" while repo/archive is mounted via "borg mount"

Open snhrdt opened this issue 8 years ago • 27 comments

Using borg mount to access an archive or even the whole repository is incredibly powerful and a very nice feature.

Unfortunately, this also locks the repository so that no new archives can be created while an archive from the same repository (or even the repository itself) is mounted.

My understanding is that a mounted archive/repository should be read-only anyways, so I do not see a good reason why a simultaneous borg create should not be allowed. I could even live with new archives added after the mount no being visible.

Background: If you run borg create quite often (like every 15 minutes) and your backed-up sources are quite large, the "restore" window between the end of one borg create and the beginning of the next one shrinks or mere minutes. Inspection of archives, non-standard or long-running restores become unfeasible.

Yes, you could store the repository on btrfs or zfs, create a snapshot of the filesystem containing the repository, mount that btrfs/zfs snapshot on /mnt/repository and then run borg mount off of that - but this seems a bit over the top, plus there are those of us who still use ext4 or similar, non-snapshotting file systems.

Changing the code so that borg mount does not lock the repository seems like a huge improvement in usability.

snhrdt avatar Mar 18 '16 13:03 snhrdt

It's not just that we have to avoid 2 writers at the same time. If a reader is active while another writer is active, it could be also that it reads inconsistent data. A lot of stuff is append-only, but e.g. the repo manifest is modified "in-place".

ThomasWaldmann avatar Mar 19 '16 18:03 ThomasWaldmann

If a reader is active while another writer is active, it could be also that it reads inconsistent data.

if we mount archive in progress? if yes, maybe possible do not allow mount archive that in progress?

infectormp avatar Mar 19 '16 20:03 infectormp

See also #420.

ThomasWaldmann avatar Mar 19 '16 21:03 ThomasWaldmann

I understand the complexities involved; nevertheless, would it not be a good idea to implement file system-like locking for archives? If an archive is being written to - do not allow it to be mounted. All other archives are fair game.

As an alternative: What do you consider as best practice for the use case I detailed in the original post (near-line backups, need to inspect/extract an archive which takes longer than the interval between backup runs)?

snhrdt avatar Apr 04 '16 16:04 snhrdt

@nprncbl it's not like archives are separate, single files for borg. besides that, we just got rid of posix-locking at other places because it caused too many compatibility issues.

If 15 minutes is a too short interval for what you'ld like to do, just use a longer interval?

ThomasWaldmann avatar Apr 04 '16 16:04 ThomasWaldmann

The main problem I see is that there are two very different approaches to this and both aren't really "neat". I do think this is a valid use case and should be on the list for 1.1 ...or later.

  1. Use transactions to isolate different processes. Problem: segment compaction destroys old transactions. So we'd need a way around that, e.g. putting "opened" transactions in the roster?
    • This would be similar to how (R)DBMS handle this.
  2. Use granular locking. Needs extra RPC calls on the Repository layer, and might add additional timeout issues. Should be tameable by minimizing accesses, e.g. a reader should lock the manifest exclusively-reading, fetch that, and unlock it immediately. An "appender" like create should lock the manifest during the entire operation for writing[1], and exclusively for doing a write.
    • Writers like prune/delete/check etc. would always be repository-exclusive as they are now.

[1] To lock out other "appenders" from adding the same archive concurrently. Or we might add a "under construction" entry to the manifest or something like that.

enkore avatar Apr 04 '16 16:04 enkore

How about:

borg mount sets append_only flag. umount unsets it. Perhaps add a append_only pin/lock? After the Repository transaction is opened it doesn't care whether new segments are added or new indices written. The cache is not used by mount iirc.

The same approach could be used for extract as well.

This would pretty much be a very simple many-reader single-writer form of MVCC with snapshot isolation.

enkore avatar Jan 09 '17 15:01 enkore

except for needing a semaphore style counter for the recursive uses - each new user needs to increment, else different users may kill each other with race conditions on state reset

RonnyPfannschmidt avatar Jan 09 '17 17:01 RonnyPfannschmidt

Yes, there will be also some other finer details to consider w.r.t. compatibility and safety of intermingled versions etc. [1] -- but I think the approach is workable and would make for a tangible improvement, as they say.

[1] Shouldn't be too hard; the a_o lock mustn't be in the repo config though.

enkore avatar Jan 09 '17 17:01 enkore

i fear this one will demonstrate seriously problematic with the way borg currently is structured

RonnyPfannschmidt avatar Jan 09 '17 17:01 RonnyPfannschmidt

Mounting append-only repo without locking should definitely be possible (not sure how manifest changes happen in append-only, but they could be made atomic).

For non-append repos, one could have an option to ensure consistency by locking as currently implemented (this can be default setting if you like). But also have non-locking version which will not guarantee consistency.

One could think of the non-locking mode in the same way as of a network mount (some implementations). We could start reading a file and then generation gets deleted, the client will return an IO error and mount will try to refresh metadata.

Vayu avatar Jan 26 '17 15:01 Vayu

I think to speak in database terms, borg extract and mount would work with a read committed mode mostly fine. Of course it would fail if the archive that is accessed is deleted while in use. But i think that is a reasonable trade off, maybe enabled with a `--no-lock'. That would also avoid mysterious stale locks that are invisible apart from the repo not beeing compacted.

The basic idea of a read committed mode would be to catch the "file does not exist" error from LoggedIO.get in repository.get and retry with a reloaded index.

Manifest is not going to be a problem, because borg only reads it once as far as i remember. Assuming the mounted / extracted repo is not deleted all chunks are still referenced and so are not going to be compacted away. So the only real things to consider are: archive is deleted:

  • Raise a clean error when the index was changed since opening the repo and a chunk is not found (i.e. "Archive was deleted" or something like that)

chunks moved:

  • open segment files are kept alive by the OS (assuming a sane server OS with posix semantics)
  • borg would need to delay deleting segment files after the new index is written to stable storage. This might slow down compaction some, but seems like a reasonable trade off.
  • as is, the index ist replaced atomically, so we would not need locking.
  • retries can be limited to only once per index rewrite (detection could be based on inode or mtime)

I think a read committed without protection against concurrent archive deletes should be a fairly local change (more eager index writing and changes in Repository.get and LoggedIO.get only)

textshell avatar Jan 30 '17 07:01 textshell

Yes, self-synchronizing read-committed would be simpler and less complex.

I wrote an implementation of what I described above, but it adds quite some complexity to the repo opening and also needed RPC changes (any approach likely does, though), so I don't think it's the correct choice for now.

enkore avatar Jan 30 '17 08:01 enkore

@enkore do you have this code available somewhere?

I would like to make it possible to run 'borg create' and 'borg extract' at the same time. I've been looking into the borg code for that, but it's not easy to get acquainted with.

I've done an attempt to get (2) from your comment https://github.com/borgbackup/borg/issues/768#issuecomment-205387270 to work, but I didn't have a lot of success yet.

Any pointers or suggestions are welcome.

Mathiasdm avatar Nov 03 '17 14:11 Mathiasdm

I'd be willing to put up a bounty on this task. From reading the comments above, this is how I see it:

Is it possible to have two lock types? (create / modify)

  1. borg create command sets "create" lock
  2. borg prune, delete, upgrade, etc sets "modify" lock
  3. borg mount, extract, list don't work with modify lock but do with create lock only on completed backups, not partial backups in progress
  • create means only data being added to repo (append-only flag used?)
  • modify means all data is subject to modification (full lock, no other tasks can perform, not even a mount or extract command) modify can't run if any other tasks active (mount, create, etc)

marcpope avatar Dec 07 '17 19:12 marcpope

Just FYI, I took a different approach for now, which may not be usable for others: I setup 2 backup directories. When uploading, I upload to one of these directories and then rsync to the other. During the upload, all backup requests go to the other backup directory.

Mathiasdm avatar Dec 10 '17 09:12 Mathiasdm

That would require double the space. I thought about doing a weird trick. Using a utility called linux hot copy. https://www.r1soft.com/free-tool-linux-hot-copy it allows you to make a snapshot on the fly. I'd rather not rely on another piece of software though.

marcpope avatar Dec 12 '17 18:12 marcpope

I need this feature as well, to be able to do a borg extract while a borg create is running, and be able to do a borg create while a borg extract is running, since the extract is read only and would be reading an archive that the create is not going to be writing to.

aiso-net avatar Jul 28 '18 20:07 aiso-net

This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.

Nebucatnetzer avatar Feb 14 '19 21:02 Nebucatnetzer

In the Server GUI I am building this is roughly how I handle it right now (may change as I come up with better solutions):

  1. Each user (client) has it’s own user account on the server so there is separation between clients
  2. ssh key for that user is un-commented
  3. Backup command runs and is forked to background
  4. As soon as backup starts, the ssh user is re-commented out. Since the user is already logged in, it will still run.
  5. After backup, the client notifies the server to run the next steps
  6. During backup, the server monitors the client for disconnects or stalled processes
  7. The server indexes the latest backup’s file structure to a database for faster recovery/searching/selecting files.
  8. The server then runs any “prune after backup” commands locally, so the client is free of any more duties than necessary
  9. A list command then compares any deleted backups with the database and deletes the old indexes.

Most of these changes are a result of feedback and testing I’ve received. I am open to improving the process.

Marc

From: Andreas Zweili [email protected] Subject: Re: [borgbackup/borg] Please allow "borg create" while repo/archive is mounted via "borg mount" (#768)

This function would be useful for GUIs as well. I first implemented the mount function so that the user can mount multiple archives before I ran into this problem. I'm not quite sure yet how I will implement a workaround so that it is easy for a user to understand.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

marcpope avatar Feb 14 '19 23:02 marcpope

My solution for now is that I show the user a dialog informing him that he needs to unmount all the archives before continuing with creating an archive. If he clicks yes I unmount all archives. Probably not the most elegant solution but it works for the moment.

Nebucatnetzer avatar Feb 16 '19 08:02 Nebucatnetzer

Just to check - did the discussion here lead anywhere? I see the issue is open, but conversation stopped close to 3 y ago...

ilippert avatar Dec 14 '21 14:12 ilippert

It's still the case that borg does not allow multiple parallel operations in same repo, because first op locks the repo.

ThomasWaldmann avatar Dec 14 '21 20:12 ThomasWaldmann

Yeah just stumbled upon this as I would like to invoke list while running create. This really should not be forbidden.

xeruf avatar Jan 20 '22 14:01 xeruf

The way I handle it is make a database after each backup of the files then I can search the database independently of the backups. Not ideal but only way to handle it.

On Jan 20, 2022, at 9:45 AM, Janek @.***> wrote:

 Yeah just stumbled upon this as I would like to invoke list while running create. This really should not be forbidden.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.

marcpope avatar Jan 20 '22 16:01 marcpope

Guess some people would simply keep borg create --list logs and then do grep -i whatiwant borg*.log.

ThomasWaldmann avatar Jan 20 '22 16:01 ThomasWaldmann

Just saw --bypass-lock in the general options https://borgbackup.readthedocs.io/en/stable/usage/general.html?highlight=bypass-lock#common-options

stevenmunro avatar Nov 28 '23 19:11 stevenmunro