borg icon indicating copy to clipboard operation
borg copied to clipboard

chunk based operations for re-compress and stats

Open taam opened this issue 4 years ago • 0 comments

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes, also see "Related Tickets" below.

Is this a BUG / ISSUE report or a QUESTION?

ISSUE

Your borg version (borg -V).

1.1.15 (but most repos were originally created with 1.0.x)

How much data is handled by borg?

Various repos from small (10GB, 50000 files, 500 archives) to bigger (up to 1TB, millions of files, 5000 archives).

Describe the problem you're observing.

Using recreate to simply re-compress a complete repo is unnecessarily(?) very slow even for small repos and pretty much unfeasible for larger ones. Therefore I'd like to suggest chunk based operations to ...

  • re-compress all chunks (similar to recreate, but independent of archives)
  • show chunk stats (which compression algorithm is used for which percentage of data/chunks, average chunk size, ... should be useful for repos where compression was changed or auto is used)

Details

recreate is not well suited for this particular task due to:

  • It's working on archives, which makes it very slow. For a small test repo (10GB) it took about 3 minutes for the first archive, then about 1 minute per consecutive archive, so it'd probably run for 8-9 hours, whereas it should be finished in minutes. (So bigger repos would run for weeks? Can't really do that...)
  • For archives created with borg 1.0, re-chunking is forced (according to https://github.com/borgbackup/borg/issues/3631#issuecomment-455889417 because back then the chunking parameters were not saved, and yes it's possible to hack the source to avoid that).

As far a I understand, it should be possible to change the compression without changing any archive information, maybe even without touching anything archive related, so it could be as fast as iterating once through the segments + rewrite? If so, the above suggestion should solve both of these issues. Also it should be less troublesome when interrupted, as there's no need to rebuild caches when not touching the archives I assume. This might also make it (more) feasible for remote repos.

Related Tickets

There exist some similar, partially overlapping tickets, but I think all have a slightly different focus, so I hope a separate ticket is ok.

  • #3614: mentions the stats aspect, which here is just a bonus (Thomas unfortunately was not a fan)
  • #3622: wants to re-compress only some archives
  • #3631: mentions the "re-compress all chunks", but focuses more on other aspects of recreate

taam avatar Apr 07 '21 09:04 taam