borg repository - going away from transactions, log, refcounts?

current state (current master branch, borg 1.x, borg 0.x, attic)

A borg repository is primarily a key/value store (with some aux functions).

The key is the chunk id (== MAC(plaintext)), the value is the compressed/encrypted/authenticated data.

borg uses transactions and a LOG when writing to the repo:

start of transaction (usually triggered by PUT/DEL)
writes more objects by appending PUT entries to the log
deletes objects by appending DEL entries to the log
commits (appends a COMMIT entry to the log)
end of transaction (S: saves repo index and hints, C: saves chunks index and files cache)

LOG means that new stuff is always appended at the end of the last/current segment file. In general, old segment files are never modified in place.

borg compact defrags non-compact segment files:

a segment file contains PUTs, DELs, COMMITs
if a PUT(id) is later deleted by a DEL(id), it creates a logical hole in a segment file (that object is not used any more), making it non-compact
compaction / defragging works by reading all still-needed objects from an old segment file and appending them to a new segment file. after that is finished, the old segment file is deleted (and that frees disk space because the new segment file is smaller).

advantages of this approach

transactions and append-only log are a very safe approaches (even if stuff crashes it usually can roll back to previous state and be fine again)
segment files are medium size files: not too large, not too small, not too many
- works well even with not very scalable filesystems
- has little overhead due to fs block / cluster size
- can be copied or deleted rather quickly (not many fs objects)

disadvantages of this approach

borg compact can cause lots of I/O when shuffling objects from old non-compact segments to new compact segments
borg compact needs some space on the fs to be able to work. bad if your fs is 100% full...
compaction code is rather complex, same for transaction management
to quickly access objects, the repository needs an index mapping id -> (segment, offset, flags)
borg currently loads the repo index (hashtable) into memory. RAM usage is about 44b * object_count + free space in hashtable. if you have a lot of files and/or a lot of data volume, repo index can need GBs of RAM.
to implement this, some special borg code is needed with access to the repo filesystem
hard to work like this without locking the repository against concurrent access.

Feb 24 '23 14:02 ThomasWaldmann

alternative implementation

We'll need a key/value store, of course. The aux functions will need checking (e.g. the nonce reservation stuff won't be needed for borg2 repos).

But maybe we could go away from transactions and LOG-like behaviour and rather use garbage collection (gc) to clean up anything that is not referenced (either because it is not used any more, because all references got deleted or because it was never used, because something crashed before the action was completed).

We could delegate a lot of stuff borg used to do with own code to the filesystem:

the key would directly map to a fs path (e.g. data/01/0123/0123456789abcdef...)
object exists/read/write/delete would map to fs operations on the respective file
besides "data", there could be other namespaces (e.g. archives, manifest, keys, config, ...)

For speed reasons, the borg client could still have a chunks index, e.g. id -> (refcount, size) (maybe something simpler without precise refcount could be sufficient also for a lot of operations).

Why/how we could live without transactions

borg create

borg create would write file content chunks, then it references these chunks from an item (which is written to a metadata stream), the metadata stream is referenced by writing an archive item, the archive item will be referenced after writing a manifest entry. The manifest entry is the root of the tree of references - until it is written none of the other references can be discovered / will be considered by borg check.
if anything crashes before writing the manifest entry, we'll just have a lot of objects with too high reference counts:
- some new chunks (which did not exist in the repo before). borg gc would recompute the correct refcount and delete them as it finds refcount==0.
- some old chunks would have a too high refcount (but are not orphans because older backups reference them) - borg gc would recompute the correct refcount, but not delete them as refcount>0.
- in general, too high refcounts are not a critical issue, they just might waste some space until the next borg check.

borg delete / borg prune

deleting an archive could just kill its manifest entry and leave everything else to the next borg check to clean up
super easy and fast, but space would only be freed by next borg gc

borg gc (maybe as part of borg check?)

needs to lock the repo against any concurrent access
needs to establish correct sets of all existing objects in the repo and all used/referenced objects. then compute all unreferenced objects.
needs to run client-side as archives (and other data structures containing references) are encrypted
removes all unreferenced objects (deletes files, cleans up empty dirs), frees repo space

advantages

simpler borg code
no "repo index" needed to find an object
no segment files, thus no need for compact_segments (borg compact).
easier to implement concurrent operations in the repo (some stuff like borg check / borg gc must still lock the repo. reference counting also hard to parallelise.)

disadvantages

needs a well scalable fs as it would contain magnitudes more objects than the backup source filesystems (due to number of backups, due to chunking)
likely higher space usage for the repo due to fs overhead (fs block / chunk size)
copying or deleting a repo would take rather long (lots of fs objects)
likely we would still need some clientside hashtables for good speed (to avoid network roundtrips, to avoid fs access)
borg check would be slower (more fs operations, smaller files)

Feb 24 '23 15:02 ThomasWaldmann

refcounts discussion moved to #7379.

Feb 24 '23 17:02 ThomasWaldmann

Extra consideration could be done for something like lmdb or git style packs

Feb 24 '23 17:02 RonnyPfannschmidt

I think duplicacy uses this file system based approach. They managed to implement the GC part to be lock free too, relying on atomic properties of the used file system

Mar 03 '23 17:03 jorickert

I think duplicacy uses this file system based approach. They managed to implement the GC part to be lock free too, relying on atomic properties of the used file system

Indeed. What's missing from duplicacy is the capacity to fuse-mount backups. There have been a couple of PR's but apparently their performance is poor. It's the only reason I haven't been using it.

Nov 08 '23 07:11 DiagonalArg

Updates:

In my experimental branch, I implemented a new Repository class:

working with borgstore
without repo index (each chunk is a separate store object)
without transactions
without log-like behaviour
as there are no segment files, no compact_segments is needed
borg compact is a garbage collector now, deleting unused objects from the store
the single "manifest-with-list-of-archives" chunk of borg 1.x was split into:
- a config/manifest store object with some metadata
- an archives/* store namespace with 1 store object per archive

Aug 13 '24 15:08 ThomasWaldmann