prefix compressed chunks with decompressed size?
we don't have that yet, but guess it would be good for:
- decompression buffer allocation
- not only knowing the csize, but also the size without actually reading and decompressing it
as this is encrypted, we do not disclose anything by this.
borg can then find out:
- csize: by looking at the PUT header, by looking at the length of the read data, via the new repo index
- size: by reading a few bytes from the chunk, decrypting them reveals: compression type, level, size
borg 1.1/1.2 needs to read all archive metadata streams to find out all chunk sizes. the new way would "only" need to read a few bytes from each chunk exactly once.
to simplify getting the size, for the ObfuscatedSize type chunks this would mean to prefix with the un-obfuscated-uncompressed size.
What we had in borg 1.2 is like:
OBFUS_HEADER = TYPE8 0x00 csize32 # the length is of the payload without the obfuscation trailer
COMPR_HEADER = TYPE8 0x00
OK, so the first idea for borg 2 was like this:
OBFUS_HEADER = TYPE8 0xFF size32 csize32 # csize32 = len(payload) - len(obfusc_trailer)
COMPR_HEADER = TYPE8 LEVEL8 size32
Maybe simpler 2nd idea:
COMPR_HEADER = TYPE8 LEVEL8 size32 csize32
type, level, csize refer to the compressed data
size is how much it is after decompression
and, important, the payload **might be longer than csize (if obfusc_trailer is appended)**
Guess this would be nice to implement a size/csize api:
headers = repo.get_headers(chunkids)
decrypt_parse(chunkids, headers) -> [(id, size, csize), ...]
Hrm, crap, guess we can not use the AEAD ciphers to just decrypt a part of the payload.
So guess we would need something like:
- sizeof(encrypted_metadata)
- encrypted_metadata (with type, level, size, csize, ... whatever), using Struct or msgpacked dict.
- obfuscated, (separately) encrypted, compressed data
superseded by #6987.