Question: Merge small segments during compaction
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
Question
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.0
Question
I'm looking for a way to merge the many small segment files borg creates during normal operation into fewer bigger ones. The underlying storage I use is optimized for large files, but since archives are created frequently I end up with many small files that don't even come close to max_segment_size.
After looking at the code I was able to modify the compact routing to produce the desired result, but I was wondering if there is another (less-hackish) way to do the same thing.
borg does not try to optimize for (very) large files. if it creates a relatively small segment file way below max_segment_size because there was not more data at the time of writing, this is not considered a problem as of now.
what borg does try to avoid is creating extremely many very-small files (like the tons of 17b files it created in 1.1.x due to a bug), but a small amount of these is also considered acceptable and intended (like e.g. having the manifest and final commit in separate files - this is done because the manifest is regularly re-written, so having it separately causes less I/O).
after initially creating a segment file, it is usually not modified any more. the only exception is if it gets "holes" and borg compact finds enough holes in it so it seems worth compacting it. Then all non-deleted entries get read and rewritten to a new segment file.
so, guess the hack you made is to trigger that rewriting just based on the size of a segment file? for your situation that is desirable (if having a big segment file size is more important than the additional I/O caused by this is undesirable).
what kind of storage do you use and why exactly are smaller files a problem?
I see. The change I made to the compact routing works as you described. It considers all segments eligible for compaction, not only those in the hints file. The threshold code needed some minor tweaking too, as -threshold 0 did not work right away.
For storage I use a ext4 filesystem with parameters tuned for very large files with snapraid on top. So it's not really a problem but a nice-to-have feature. The filesystem has a larger byte/inode ratio among other settings which I cannot change.
Guess this is solved?