borg icon indicating copy to clipboard operation
borg copied to clipboard

Question: Merge small segments during compaction

Open pcguy85 opened this issue 3 years ago • 2 comments

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Question

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.0

Question

I'm looking for a way to merge the many small segment files borg creates during normal operation into fewer bigger ones. The underlying storage I use is optimized for large files, but since archives are created frequently I end up with many small files that don't even come close to max_segment_size.

After looking at the code I was able to modify the compact routing to produce the desired result, but I was wondering if there is another (less-hackish) way to do the same thing.

pcguy85 avatar Aug 09 '22 15:08 pcguy85

borg does not try to optimize for (very) large files. if it creates a relatively small segment file way below max_segment_size because there was not more data at the time of writing, this is not considered a problem as of now.

what borg does try to avoid is creating extremely many very-small files (like the tons of 17b files it created in 1.1.x due to a bug), but a small amount of these is also considered acceptable and intended (like e.g. having the manifest and final commit in separate files - this is done because the manifest is regularly re-written, so having it separately causes less I/O).

after initially creating a segment file, it is usually not modified any more. the only exception is if it gets "holes" and borg compact finds enough holes in it so it seems worth compacting it. Then all non-deleted entries get read and rewritten to a new segment file.

so, guess the hack you made is to trigger that rewriting just based on the size of a segment file? for your situation that is desirable (if having a big segment file size is more important than the additional I/O caused by this is undesirable).

what kind of storage do you use and why exactly are smaller files a problem?

ThomasWaldmann avatar Aug 09 '22 16:08 ThomasWaldmann

I see. The change I made to the compact routing works as you described. It considers all segments eligible for compaction, not only those in the hints file. The threshold code needed some minor tweaking too, as -threshold 0 did not work right away.

For storage I use a ext4 filesystem with parameters tuned for very large files with snapraid on top. So it's not really a problem but a nice-to-have feature. The filesystem has a larger byte/inode ratio among other settings which I cannot change.

pcguy85 avatar Aug 09 '22 17:08 pcguy85

Guess this is solved?

ThomasWaldmann avatar Apr 06 '23 23:04 ThomasWaldmann