repository - going away from transactions, log, refcounts?
current state (current master branch, borg 1.x, borg 0.x, attic)
A borg repository is primarily a key/value store (with some aux functions).
The key is the chunk id (== MAC(plaintext)), the value is the compressed/encrypted/authenticated data.
borg uses transactions and a LOG when writing to the repo:
- start of transaction (usually triggered by PUT/DEL)
- writes more objects by appending PUT entries to the log
- deletes objects by appending DEL entries to the log
- commits (appends a COMMIT entry to the log)
- end of transaction (S: saves repo index and hints, C: saves chunks index and files cache)
LOG means that new stuff is always appended at the end of the last/current segment file. In general, old segment files are never modified in place.
borg compact defrags non-compact segment files:
- a segment file contains PUTs, DELs, COMMITs
- if a PUT(id) is later deleted by a DEL(id), it creates a logical hole in a segment file (that object is not used any more), making it non-compact
- compaction / defragging works by reading all still-needed objects from an old segment file and appending them to a new segment file. after that is finished, the old segment file is deleted (and that frees disk space because the new segment file is smaller).
advantages of this approach
- transactions and append-only log are a very safe approaches (even if stuff crashes it usually can roll back to previous state and be fine again)
- segment files are medium size files: not too large, not too small, not too many
- works well even with not very scalable filesystems
- has little overhead due to fs block / cluster size
- can be copied or deleted rather quickly (not many fs objects)
disadvantages of this approach
- borg compact can cause lots of I/O when shuffling objects from old non-compact segments to new compact segments
- borg compact needs some space on the fs to be able to work. bad if your fs is 100% full...
- compaction code is rather complex, same for transaction management
- to quickly access objects, the repository needs an index mapping
id -> (segment, offset, flags) - borg currently loads the repo index (hashtable) into memory. RAM usage is about 44b * object_count + free space in hashtable. if you have a lot of files and/or a lot of data volume, repo index can need GBs of RAM.
- to implement this, some special borg code is needed with access to the repo filesystem
- hard to work like this without locking the repository against concurrent access.
alternative implementation
We'll need a key/value store, of course. The aux functions will need checking (e.g. the nonce reservation stuff won't be needed for borg2 repos).
But maybe we could go away from transactions and LOG-like behaviour and rather use garbage collection (gc) to clean up anything that is not referenced (either because it is not used any more, because all references got deleted or because it was never used, because something crashed before the action was completed).
We could delegate a lot of stuff borg used to do with own code to the filesystem:
- the key would directly map to a fs path (e.g.
data/01/0123/0123456789abcdef...) - object exists/read/write/delete would map to fs operations on the respective file
- besides "data", there could be other namespaces (e.g. archives, manifest, keys, config, ...)
For speed reasons, the borg client could still have a chunks index, e.g. id -> (refcount, size) (maybe something simpler without precise refcount could be sufficient also for a lot of operations).
Why/how we could live without transactions
borg create
- borg create would write file content chunks, then it references these chunks from an item (which is written to a metadata stream), the metadata stream is referenced by writing an archive item, the archive item will be referenced after writing a manifest entry. The manifest entry is the root of the tree of references - until it is written none of the other references can be discovered / will be considered by borg check.
- if anything crashes before writing the manifest entry, we'll just have a lot of objects with too high reference counts:
- some new chunks (which did not exist in the repo before). borg gc would recompute the correct refcount and delete them as it finds refcount==0.
- some old chunks would have a too high refcount (but are not orphans because older backups reference them) - borg gc would recompute the correct refcount, but not delete them as refcount>0.
- in general, too high refcounts are not a critical issue, they just might waste some space until the next borg check.
borg delete / borg prune
- deleting an archive could just kill its manifest entry and leave everything else to the next borg check to clean up
- super easy and fast, but space would only be freed by next borg gc
borg gc (maybe as part of borg check?)
- needs to lock the repo against any concurrent access
- needs to establish correct sets of all existing objects in the repo and all used/referenced objects. then compute all unreferenced objects.
- needs to run client-side as archives (and other data structures containing references) are encrypted
- removes all unreferenced objects (deletes files, cleans up empty dirs), frees repo space
advantages
- simpler borg code
- no "repo index" needed to find an object
- no segment files, thus no need for compact_segments (
borg compact). - easier to implement concurrent operations in the repo (some stuff like borg check / borg gc must still lock the repo. reference counting also hard to parallelise.)
disadvantages
- needs a well scalable fs as it would contain magnitudes more objects than the backup source filesystems (due to number of backups, due to chunking)
- likely higher space usage for the repo due to fs overhead (fs block / chunk size)
- copying or deleting a repo would take rather long (lots of fs objects)
- likely we would still need some clientside hashtables for good speed (to avoid network roundtrips, to avoid fs access)
- borg check would be slower (more fs operations, smaller files)
refcounts discussion moved to #7379.
Extra consideration could be done for something like lmdb or git style packs
I think duplicacy uses this file system based approach. They managed to implement the GC part to be lock free too, relying on atomic properties of the used file system
I think duplicacy uses this file system based approach. They managed to implement the GC part to be lock free too, relying on atomic properties of the used file system
Indeed. What's missing from duplicacy is the capacity to fuse-mount backups. There have been a couple of PR's but apparently their performance is poor. It's the only reason I haven't been using it.
Updates:
In my experimental branch, I implemented a new Repository class:
- working with
borgstore - without repo index (each chunk is a separate store object)
- without transactions
- without log-like behaviour
- as there are no segment files, no compact_segments is needed
borg compactis a garbage collector now, deleting unused objects from the store- the single "manifest-with-list-of-archives" chunk of borg 1.x was split into:
- a
config/manifeststore object with some metadata - an
archives/*store namespace with 1 store object per archive
- a