borg icon indicating copy to clipboard operation
borg copied to clipboard

to pack or not to pack ...

Open ThomasWaldmann opened this issue 1 year ago • 14 comments

borg 1.x segment files

borg 1.x used:

  • "segment files", elsewhere also known as "pack files" to store multiple repository objects in one file.
  • a "repository index" to be able to find these objects, using a mapping object id --> (segment_name, offset_in_segment).
  • transactions and rollback via log-like appending of operations (PUT, DEL, COMMIT) to these segment files

borg2 status quo: objects stored separately

borg2 is much simpler:

  • implemented using borgstore (k/v store with misc. backends)
  • objects are stored separately: 1 file chunk --> 1 repo object
  • objects can be directly found by their id (e.g. the id is mapped to the fs path / file name)
  • no transactions, no log-like appending - but correct write order

Pros:

  • simplicity
  • no need for some sort of "index" (which could be corrupted or out of date)
  • no segment file compaction needed, the server-side filesystem manages space allocation

Cons:

  • leads to big amounts of relatively small objects transferred and stored individually in the repository
  • latency and other overheads have quite a speed impact for remote repositories
  • depending on the storage type / filesystem, there will be more or less storage space usage overhead due to block size, esp. for many very small objects
  • dealing with lots of objects / doing lots of api calls can be expensive for some cloud storage providers

borg2 alternative idea

  • client assembles packs locally, transfers to store when the pack has reached the desired size or when there is no more data to write.
  • pack files have a per-pack index appended (pointing to the objects contained in the pack), so the per-pack index can be read without reading the full pack.
  • the per-pack index would also contain the RepoObj metadata (e.g. compression type/level, etc.)

Pros:

  • a lot less objects in store, less api calls, less latency impact

Cons:

  • more complex in general
  • will need an addtl. global index mapping object_id -> pack_id, offset_in_pack
  • will need more memory for that global index
  • space is managed clientside, causing more (network) I/O: compact will need to read the pack, drop unused entries and write it back to the store, update indexes

Side note: desired pack "size" could be given by amount of objects in the pack (N) or by the overall size of all objects in the pack (S). For the special case of N == 1 it would be a slightly different implementation (using a different file format) of what we currently have in borg2, not necessarily need that global index and also compact would still be very easy.

Related: #191

ThomasWaldmann avatar Nov 27 '24 18:11 ThomasWaldmann

I would really appreciate the use of packs. Currently borg 2 is "incompatible" with most USB hard disks with SMR recording. I used a Toshiba 4TB external USB hard drive for borg2 testing and a borg check was done approx. 50% after 12 hours when i killed it (needed the USB port). The repository was only approx 1,3TB

dietmargoldbeck avatar Dec 03 '24 08:12 dietmargoldbeck

I consider packs essential

An alternative would be key value stores that optimize content addressing

RonnyPfannschmidt avatar Dec 03 '24 11:12 RonnyPfannschmidt

@dietmargoldbeck what you've seen are 33MB/s.

That's not too bad for an initial backup to an USB (SMR) HDD.

Initial backups always feel very slow just due to the amount of data and processing.

ThomasWaldmann avatar Dec 03 '24 14:12 ThomasWaldmann

Is this expected to break compatibility with the current (pre-pack) 2.0 betas?

tve avatar Aug 14 '25 05:08 tve

@tve that likely will be the case.

in general, there is no compatibility assertion between betas and no upgrade code for beta-to-beta upgrades is provided.

ThomasWaldmann avatar Aug 14 '25 16:08 ThomasWaldmann

I run borg (v1 implied) on a raidz1 array. I've been doing experiments with borg-2, and notice that archives I create with borg-2 been taking more space than the equivalent borg archives. I finally figured it out.

First off, according to my notes, borg-2 does a fine job deduplicating; nothing I describe below can really be contributed to borg-2 deduplication. See the Apparent Disk Usage column for an example repo I backed up with borg vs borg-2 (usage is bytes):

Borg Version Real Disk Usage Apparent Disk Usage Dirs Files
v2 30390315008 24509081801 65797 681435
v1 23415717376 23456004666 4 287

raidz Overhead

The actual disk usage (du -sB1) above differs dramatically from apparent disk usage (du -sb). The TLDR is, for a raidz system, if many files on your filesystem have both the following properties:

  1. Small relative to recordsize (usually 128k bytes, 1M bytes is sometimes common).
  2. Not close to a multiple of 8kB (2 sectors of a 4k bytes/sector disk).

You will have significant space overhead actually storing these files on your array due to how raidz works. In the above case, this is an overhead of 25%2!

(30390315008 - (1000 * 1000 * 1000)) / (24509081801 - (1000 * 1000 * 1000)) = 1.2502

I subtracted 1G because my scripts run borg repo-space --reserve 1G beforehand. This is with the default chunker (or close enough, I may have changed the window size to 16383, but nothing else). With --chunker-params buzhash,10,23,16,4095, I saw even larger overheads:

Real: 28227564544
Apparent: 20624082512
Overhead: 1.387

This overhead gets worse for raidz2 and raidz3, but the TLDR above re: 8kB seems to still hold.

Why Pack Files Are Useful For raidz

This overhead issue does not exist in borg because borg creates files around 500 MB in size. In my 2.18TB repo w/ around ~4000 of these files, and ZFS -o recordsize=1M -o compress=off, I'm losing ~4GB. This is all due to not enabling compression (see mitigations); the raidz overhead basically doesn't matter at this point. And even this overhead isn't worth me starting over:

100 * (4 / (2.18 * 1000)) = 0.18

Not even 1/5th of a percent3! If borg-2 supported pack files, the small file overhead could possibly be amortized.

Data

Here are some numbers on another borg-2 repo, separate from the above one4:

Size (Full)

Real: 24297848832
Apparent: 20723652906
Overhead: 1.181

Overhead once again subtracts 1GB from apparent and real.

File Size Distribution (Full)

[32, 64): 0
[64, 128): 0
[128, 256): 9685
[256, 512): 54204
[512, 1024): 39688
[1024, 2048): 30403
[2048, 4096): 36669
[4096, 8192): 34776
[8192, 16384): 20283
[16384, 32768): 12367
[32768, 65536): 8879
[65536, 131072): 6908
[131072, 262144): 4079
[262144, 524288): 2388
[524288, 1048576): 2765
[1048576, 2097152): 2433
[2097152, 4194304): 1763
[4194304, 8388608): 786
[8388608, 16777216): 98
Total: 268174

A lot of files have sizes < 8192, no doubt due to borg-2 compression before encryption. This shows up as the 18% overhead. Perhaps there's a formula that can approximate the overhead without having to run du multiple times, but I haven't come up with it yet. Of course raidz2 and raidz3 overheads would need different calculations.1

Size (Single)

Real: 24297848832
Apparent: 20723652906
Overhead: 1.195

Note the overhead for any single backup tends to be worse than the overhead for many backups of the same tree (though I have seen adding a single backup make the full overhead worse).

File Size Distribution (Single)

[32, 64): 0
[64, 128): 0
[128, 256): 9232
[256, 512): 53065
[512, 1024): 37997
[1024, 2048): 27980
[2048, 4096): 31550
[4096, 8192): 21587
[8192, 16384): 12887
[16384, 32768): 8827
[32768, 65536): 4553
[65536, 131072): 3249
[131072, 262144): 1808
[262144, 524288): 1404
[524288, 1048576): 2267
[1048576, 2097152): 2141
[2097152, 4194304): 1696
[4194304, 8388608): 778
[8388608, 16777216): 98
Total: 221119

One backup tends to distribute borg-2 files sizes similarly for multiple backups. Therefore, in the future, I may streamline data collection by assuming a single backup generates a histogram whose bins have nearly the same height relative to one another.

Mitigations

  • One thing that can help- enable zfs set compress=on. Yes, even with borg supplying its own compression, this does help a bit because ZFS records must be a power of 2 between sector size and recordsize. In ZFS, files in size between powers of two are zero-padded to the next power of 2 or next integer multiple of recordsize. That zero padding does compress well. I inexplicably didn't keep my numbers for this, but IIRC I saw something like 4-5% less space usage for recordsize=128k (~32GB became 30.8GB, as above) and recordsize=1M (~32GB became 30.4GB, as above).
  • Additionally my own limited experience so far is that frequently and consistently backing up amortizes the cost of the overhead. I would have to have detailed reference-counting info to know why.
  • Unless encryption=none, I don't believe disabling borg-2 compression and relying only on ZFS compression will help you, since compression is done before the encryption. ZFS won't know how to compress anything further except the aforementioned zero padding.

I think that covers everything. If you decide not to implement packing at all, I think there should be info in the manual about tuning borg-2 for raidz systems. Perhaps this reply could be used as a base :D!

Footnotes

  1. raidz is equivalent to raidz1; it is similar to RAID5 in that one drive can fail without compromising data. All numbers/approximation are directly valid for raidz only, as I don't have the hardware (or the $ :P) to test raidz2 or raidz3. Maybe I can get someone I trust who does to let me run a script to get some numbers.

  2. This is not unique to borg-2. On a raidz system, try this:

    wget https://web.archive.org/web/20120105073610/http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin
    mkdir digits
    split -b4096 AMillionRandomDigits.bin digits/amrd
    du -sB1 AMillionRandomDigits.bin
    du -sb AMillionRandomDigits.bin
    du -sB1 digits/
    du -sb digits/
    

    I get the following dus:

    535040  AMillionRandomDigits.bin
    415241  AMillionRandomDigits.bin
    637952  digits/
    415345  digits/
    
  3. The actual overhead is closer to 0.25%, but I don't have the actual numbers handy right now.

  4. It took me a few nights to get a system down for collecting data.

cr1901 avatar Sep 29 '25 04:09 cr1901

The new format also significantly increases borg check time, at least on a btrfs-formatted HDD. The drive very audibly seeks throughout while checking the borg2 repository but is almost silent with the borg1 repository.

(Repositories are created from the same dataset.)

$ time borg check --repository-only borg1/

real    4m36.262s
user    0m13.271s
sys     0m26.063s
$ time borg2 check --repository-only -r borg2/

real    18m39.614s
user    0m32.492s
sys     0m30.501s

SimonPilkington avatar Dec 02 '25 06:12 SimonPilkington