proposal draft: indexed segments
this is a basic draft proposal for a different way to compose segments and to handle "commits". it's mainly collecting some loose thoughts i kept talking about, and it's time i deliver some basic write-up to build upon.
instead of a segment being a linear log of actions, only one being able to happen in parallel, a segment would represent additions and removals on top of a prior state.
which means that a segment starts as a temporary file, designating other segments it depends on as well as the objects added/removed while the segment is getting filled with data.
the "commit" action would happen by taking this temporary file and moving it to a "final destination".
at a fundamental level this would allow to create more than one segment at once, at the cost of inhibiting immediate deduplication between data of segments that are being added in parallel.
this layering of the creation and dependencies of a segment creates an additional benefit: a segment can be created locally, then uploaded to something like S3 in a much more streamlined fashion.
recreating segments in a manner that rearranges dependencies in order to deduplicate has to be ironed out as well.
a major downside of this approach is the new layer of tracked dependencies, which basically makes graph processing a requirement for at least the sorting out process of post processing the deduplication as deletions and conflicts of them have to be forwarded in some sensible manner even though a new archive that was created in parallel may be depending on such data. (a practical limitation before sorting that out could be to only allow deleting operations to happen with a full lock while backups that only add data may happen in parallel)
looks like this needs a different repository format, so tagging it as breaking.
The above is a bit outdated as it refers to borg 1.x transactional / log-like implementation, which is already gone in borg2.
I created #8572 to get a more up-to-date issue which supersedes this one.