to pack or not to pack ...
borg 1.x segment files
borg 1.x used:
- "segment files", elsewhere also known as "pack files" to store multiple repository objects in one file.
- a "repository index" to be able to find these objects, using a mapping
object id --> (segment_name, offset_in_segment). - transactions and rollback via log-like appending of operations (PUT, DEL, COMMIT) to these segment files
borg2 status quo: objects stored separately
borg2 is much simpler:
- implemented using
borgstore(k/v store with misc. backends) - objects are stored separately: 1 file chunk --> 1 repo object
- objects can be directly found by their id (e.g. the id is mapped to the fs path / file name)
- no transactions, no log-like appending - but correct write order
Pros:
- simplicity
- no need for some sort of "index" (which could be corrupted or out of date)
- no segment file compaction needed, the server-side filesystem manages space allocation
Cons:
- leads to big amounts of relatively small objects transferred and stored individually in the repository
- latency and other overheads have quite a speed impact for remote repositories
- depending on the storage type / filesystem, there will be more or less storage space usage overhead due to block size, esp. for many very small objects
- dealing with lots of objects / doing lots of api calls can be expensive for some cloud storage providers
borg2 alternative idea
- client assembles packs locally, transfers to store when the pack has reached the desired size or when there is no more data to write.
- pack files have a per-pack index appended (pointing to the objects contained in the pack), so the per-pack index can be read without reading the full pack.
- the per-pack index would also contain the RepoObj metadata (e.g. compression type/level, etc.)
Pros:
- a lot less objects in store, less api calls, less latency impact
Cons:
- more complex in general
- will need an addtl. global index mapping
object_id -> pack_id, offset_in_pack - will need more memory for that global index
- space is managed clientside, causing more (network) I/O: compact will need to read the pack, drop unused entries and write it back to the store, update indexes
Side note: desired pack "size" could be given by amount of objects in the pack (N) or by the overall size of all objects in the pack (S). For the special case of N == 1 it would be a slightly different implementation (using a different file format) of what we currently have in borg2, not necessarily need that global index and also compact would still be very easy.
Related: #191
I would really appreciate the use of packs. Currently borg 2 is "incompatible" with most USB hard disks with SMR recording. I used a Toshiba 4TB external USB hard drive for borg2 testing and a borg check was done approx. 50% after 12 hours when i killed it (needed the USB port). The repository was only approx 1,3TB
I consider packs essential
An alternative would be key value stores that optimize content addressing
@dietmargoldbeck what you've seen are 33MB/s.
That's not too bad for an initial backup to an USB (SMR) HDD.
Initial backups always feel very slow just due to the amount of data and processing.
Is this expected to break compatibility with the current (pre-pack) 2.0 betas?
@tve that likely will be the case.
in general, there is no compatibility assertion between betas and no upgrade code for beta-to-beta upgrades is provided.
I run borg (v1 implied) on a raidz1 array. I've been doing experiments with borg-2, and notice that archives I create with borg-2 been taking more space than the equivalent borg archives. I finally figured it out.
First off, according to my notes, borg-2 does a fine job deduplicating; nothing I describe below can really be contributed to borg-2 deduplication. See the Apparent Disk Usage column for an example repo I backed up with borg vs borg-2 (usage is bytes):
| Borg Version | Real Disk Usage | Apparent Disk Usage | Dirs | Files |
|---|---|---|---|---|
| v2 | 30390315008 | 24509081801 | 65797 | 681435 |
| v1 | 23415717376 | 23456004666 | 4 | 287 |
raidz Overhead
The actual disk usage (du -sB1) above differs dramatically from apparent disk usage (du -sb). The TLDR is, for a raidz system, if many files on your filesystem have both the following properties:
- Small relative to
recordsize(usually128kbytes,1Mbytes is sometimes common). - Not close to a multiple of 8kB (2 sectors of a 4k bytes/sector disk).
You will have significant space overhead actually storing these files on your array due to how raidz works. In the above case, this is an overhead of 25%2!
(30390315008 - (1000 * 1000 * 1000)) / (24509081801 - (1000 * 1000 * 1000)) = 1.2502
I subtracted 1G because my scripts run borg repo-space --reserve 1G beforehand. This is with the default chunker (or close enough, I may have changed the window size to 16383, but nothing else). With --chunker-params buzhash,10,23,16,4095, I saw even larger overheads:
Real: 28227564544
Apparent: 20624082512
Overhead: 1.387
This overhead gets worse for raidz2 and raidz3, but the TLDR above re: 8kB seems to still hold.
Why Pack Files Are Useful For raidz
This overhead issue does not exist in borg because borg creates files around 500 MB in size. In my 2.18TB repo w/ around ~4000 of these files, and ZFS -o recordsize=1M -o compress=off, I'm losing ~4GB. This is all due to not enabling compression (see mitigations); the raidz overhead basically doesn't matter at this point. And even this overhead isn't worth me starting over:
100 * (4 / (2.18 * 1000)) = 0.18
Not even 1/5th of a percent3! If borg-2 supported pack files, the small file overhead could possibly be amortized.
Data
Here are some numbers on another borg-2 repo, separate from the above one4:
Size (Full)
Real: 24297848832
Apparent: 20723652906
Overhead: 1.181
Overhead once again subtracts 1GB from apparent and real.
File Size Distribution (Full)
[32, 64): 0
[64, 128): 0
[128, 256): 9685
[256, 512): 54204
[512, 1024): 39688
[1024, 2048): 30403
[2048, 4096): 36669
[4096, 8192): 34776
[8192, 16384): 20283
[16384, 32768): 12367
[32768, 65536): 8879
[65536, 131072): 6908
[131072, 262144): 4079
[262144, 524288): 2388
[524288, 1048576): 2765
[1048576, 2097152): 2433
[2097152, 4194304): 1763
[4194304, 8388608): 786
[8388608, 16777216): 98
Total: 268174
A lot of files have sizes < 8192, no doubt due to borg-2 compression before encryption. This shows up as the 18% overhead. Perhaps there's a formula that can approximate the overhead without having to run du multiple times, but I haven't come up with it yet. Of course raidz2 and raidz3 overheads would need different calculations.1
Size (Single)
Real: 24297848832
Apparent: 20723652906
Overhead: 1.195
Note the overhead for any single backup tends to be worse than the overhead for many backups of the same tree (though I have seen adding a single backup make the full overhead worse).
File Size Distribution (Single)
[32, 64): 0
[64, 128): 0
[128, 256): 9232
[256, 512): 53065
[512, 1024): 37997
[1024, 2048): 27980
[2048, 4096): 31550
[4096, 8192): 21587
[8192, 16384): 12887
[16384, 32768): 8827
[32768, 65536): 4553
[65536, 131072): 3249
[131072, 262144): 1808
[262144, 524288): 1404
[524288, 1048576): 2267
[1048576, 2097152): 2141
[2097152, 4194304): 1696
[4194304, 8388608): 778
[8388608, 16777216): 98
Total: 221119
One backup tends to distribute borg-2 files sizes similarly for multiple backups. Therefore, in the future, I may streamline data collection by assuming a single backup generates a histogram whose bins have nearly the same height relative to one another.
Mitigations
- One thing that can help- enable
zfs set compress=on. Yes, even withborgsupplying its own compression, this does help a bit because ZFS records must be a power of 2 between sector size andrecordsize. In ZFS, files in size between powers of two are zero-padded to the next power of 2 or next integer multiple ofrecordsize. That zero padding does compress well. I inexplicably didn't keep my numbers for this, but IIRC I saw something like 4-5% less space usage forrecordsize=128k(~32GB became 30.8GB, as above) andrecordsize=1M(~32GB became 30.4GB, as above). - Additionally my own limited experience so far is that frequently and consistently backing up amortizes the cost of the overhead. I would have to have detailed reference-counting info to know why.
- Unless
encryption=none, I don't believe disablingborg-2compression and relying only on ZFS compression will help you, since compression is done before the encryption. ZFS won't know how to compress anything further except the aforementioned zero padding.
I think that covers everything. If you decide not to implement packing at all, I think there should be info in the manual about tuning borg-2 for raidz systems. Perhaps this reply could be used as a base :D!
Footnotes
-
raidzis equivalent toraidz1; it is similar to RAID5 in that one drive can fail without compromising data. All numbers/approximation are directly valid forraidzonly, as I don't have the hardware (or the $ :P) to testraidz2orraidz3. Maybe I can get someone I trust who does to let me run a script to get some numbers. -
This is not unique to
borg-2. On araidzsystem, try this:wget https://web.archive.org/web/20120105073610/http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin mkdir digits split -b4096 AMillionRandomDigits.bin digits/amrd du -sB1 AMillionRandomDigits.bin du -sb AMillionRandomDigits.bin du -sB1 digits/ du -sb digits/I get the following
dus:535040 AMillionRandomDigits.bin 415241 AMillionRandomDigits.bin 637952 digits/ 415345 digits/ -
The actual overhead is closer to 0.25%, but I don't have the actual numbers handy right now.
-
It took me a few nights to get a system down for collecting data.
The new format also significantly increases borg check time, at least on a btrfs-formatted HDD. The drive very audibly seeks throughout while checking the borg2 repository but is almost silent with the borg1 repository.
(Repositories are created from the same dataset.)
$ time borg check --repository-only borg1/
real 4m36.262s
user 0m13.271s
sys 0m26.063s
$ time borg2 check --repository-only -r borg2/
real 18m39.614s
user 0m32.492s
sys 0m30.501s