borg icon indicating copy to clipboard operation
borg copied to clipboard

prefix compressed chunks with decompressed size?

Open ThomasWaldmann opened this issue 3 years ago • 1 comments

we don't have that yet, but guess it would be good for:

  • decompression buffer allocation
  • not only knowing the csize, but also the size without actually reading and decompressing it

as this is encrypted, we do not disclose anything by this.

borg can then find out:

  • csize: by looking at the PUT header, by looking at the length of the read data, via the new repo index
  • size: by reading a few bytes from the chunk, decrypting them reveals: compression type, level, size

borg 1.1/1.2 needs to read all archive metadata streams to find out all chunk sizes. the new way would "only" need to read a few bytes from each chunk exactly once.

ThomasWaldmann avatar May 18 '22 01:05 ThomasWaldmann

to simplify getting the size, for the ObfuscatedSize type chunks this would mean to prefix with the un-obfuscated-uncompressed size.

ThomasWaldmann avatar May 18 '22 12:05 ThomasWaldmann

What we had in borg 1.2 is like:

OBFUS_HEADER = TYPE8 0x00 csize32  # the length is of the payload without the obfuscation trailer
COMPR_HEADER = TYPE8 0x00 

OK, so the first idea for borg 2 was like this:

OBFUS_HEADER = TYPE8 0xFF size32 csize32  # csize32 = len(payload) - len(obfusc_trailer)
COMPR_HEADER = TYPE8 LEVEL8 size32

Maybe simpler 2nd idea:

COMPR_HEADER = TYPE8 LEVEL8 size32 csize32
type, level, csize refer to the compressed data
size is how much it is after decompression
and, important, the payload **might be longer than csize (if obfusc_trailer is appended)**

Guess this would be nice to implement a size/csize api:

headers = repo.get_headers(chunkids)
decrypt_parse(chunkids, headers) -> [(id, size, csize), ...]

Hrm, crap, guess we can not use the AEAD ciphers to just decrypt a part of the payload.

ThomasWaldmann avatar Aug 19 '22 11:08 ThomasWaldmann

So guess we would need something like:

  • sizeof(encrypted_metadata)
  • encrypted_metadata (with type, level, size, csize, ... whatever), using Struct or msgpacked dict.
  • obfuscated, (separately) encrypted, compressed data

ThomasWaldmann avatar Aug 19 '22 16:08 ThomasWaldmann

superseded by #6987.

ThomasWaldmann avatar Sep 05 '22 14:09 ThomasWaldmann