borg icon indicating copy to clipboard operation
borg copied to clipboard

cryptographically secure keyed rolling hash / MAC?

Open ThomasWaldmann opened this issue 8 years ago • 13 comments

is there such a thing? could we use it to chunk and compute the storage key in one go?

currently the chunker computes a 32bit rolling hash (buzhash) over a 4095 bytes window to determine cutting places from the content. the rolling hash value is thrown away then and a cryptographic hash ([hmac-]sha256, blake2b) is computed over the whole (~2MB) chunk to determine the storage key / for deduplication.

if we only used a CSKRH to save one step in processing, the window size would grow to be equal to the chunk size (we can't have small window and big chunk, that would cause hash collisions) and also the output size would need to be >= 256bits.

ThomasWaldmann avatar Feb 05 '17 13:02 ThomasWaldmann

StackOverflow

The suggested scheme looks correct to me, however, it's performance is inferior to what we have right now, even with hardware-accelerated AES.

I think if we want to get more performance there is a simple tradeoff; don't cut on byte boundaries. Cut on aligned 4/8 byte boundaries, and process 4/8 bytes in each step (I did some experiments in this direction called beer4hash and beer8hash, each was >1 GB/s, otherwise same construction as buzhash).

If we want to make it more secure (ie. less leaky)... ahem. There is just not a whole lot of bits there to begin with, and many files will be only 1-2 chunks.

enkore avatar Feb 05 '17 13:02 enkore

Performance is only inferior if R is slower. But likely it would have to be due to the required properties. E could be done only when needed (at cutting places).

Can you link to the beerNhash code?

ThomasWaldmann avatar Feb 05 '17 13:02 ThomasWaldmann

Performance is only inferior if R is slower. But likely it would have to be due to the required properties. E could be done only when needed (at cutting places).

No, you can't -- if you want what's in the topic you have to cut according to the bits after encryption, meaning that for byte-granularity you have to encrypt the current hash after every byte. A high-end Haswell PC might do something like 100 - 200 million AES blocks per second (-- this number includes overhead of calling into OpenSSL etc.), which would then be about the throughput to expect. On lower end platforms this will look much worse.

If you don't need that, then there is not much point to using that or a similar construction.

I don't think I pushed/cleaned them up yet but it's pretty straightforward stuff.

enkore avatar Feb 05 '17 14:02 enkore

Also a problem: It's still just a 32 bit hash. The block cipher is only used as a permutation 32b -> 128b (w/ AES 128b block size), but that doesn't change anything about there only being 2**32 possible outputs, which is far too weak for using that construction.

I think a CSKRH is even a bit of an oxymoron: the RH properties we want means that it's linear over short byte ranges (and then permuted / noise injected through table like in buzhash), but that's something we don't want for the ID hash. Perhaps a smart cryptographer has a way around that, but... it would still be a very niche construction.

enkore avatar Feb 06 '17 22:02 enkore

I didn't suggest using the current 32bit RH as basis, so we would just need a much longer RH. To be used as id hash, it would also need to cover the whole chunk (e.g. 2MiB rather than 4095B).

ThomasWaldmann avatar Feb 06 '17 23:02 ThomasWaldmann

on restic's side, they use the Rabin Fingerprinting algorithm, which is (according to Wikipedia), possibly slower than the buzhash used by Borg. Yet when I asked @fd0 about this, he said this wasn't the bottleneck so I wonder if it's the case that we really need to optimize this for borg or that would be misplaced optimization...

anarcat avatar May 26 '18 20:05 anarcat

For the reference, the chunker is written in pure Go (no fancy assembler optimizations or so), uses 64 bit Rabin fingerprints as a rolling hash, and on my machine (Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz) comes to 391MB/s:

$ go test -bench .
BenchmarkChunker-4                    20          85682339 ns/op         391.61 MB/s

Overall, that's more than enough for restic, the other stages that we have (most importantly the SHA256 for identifying a plaintext chunk and the SHA256 for the storage file name over the encrypted content) are the slowdown here.

fd0 avatar May 27 '18 10:05 fd0

Isn't restic's data processing parallelized, though? With Borg not having any parallelization whatsoever, all stages you mentioned are the bottleneck; and doing IO in the same thread further reduces throughput. Even on a high-end CPU with high-end IO, Borg's throughput naturally falls short of adequate.

enkore avatar May 27 '18 11:05 enkore

Oh sure, restic runs almost everything in parallel. There's two threads reading files and running the chucker, several threads doing SHA256 on the chunks and so on. :)

fd0 avatar May 27 '18 11:05 fd0

so I guess there would be an advantage in changing the rolling hash in borg, considering it's more of a serial pipeline right now. yet this probably means all chunks will change with a new algorithm, which might be too disruptive to justify such a change. best would be to parallelize borg, really... and that's another issue (#37) but at least it wouldn't affect the repository format.

thanks for the feedback @fd0 and @enkore ! :)

anarcat avatar May 27 '18 11:05 anarcat

I don’t see the value in layering the PRP with the ε-universal rolling hash whichever order it is.

You can cut the input file in overlapping 128-bit blocks Bi indexed i = 0 thru n, they can each be mapped to a random 128-bit number as Ci = PRP(K, Ci-1 XOR Bi), with C0 = PRP(message size) (eg. with PRP = AES-128; this is basically CBC-MAC). For a w-wide window, you keep O = Ci-w around. You use XOR[i-w…i] Ci to determine where to cut: adding the next character is XORing Ci+1 into it, removing the oldest character is XORing O into it and setting O = PRP(K, O XOR Bi-w+1). When a cut is found, call your chunk Kj (j is the chunk number) and store Ci on it as T[0,j] (tag of chunk j, level 0).

Then you do the Merkle tree thing: use a compression function to compute T[1,j-1] = H({0, j-1, Tj-1}, {0, j, Tj}) as the hash of the binary tree {j-1, j}, then T[2,j-3] = H({1, j-3, T[1,j-3]}, {1, j-1, T[1,j-1]}), and up we go. When we finish processing the file, the highest T is the file’s identifier.


I thought about another alternative to the SO answer with lower performance impact: to use a keyed PRF to fill the buzhash table. However, repeated characters would flaw the rolling hash in ways that would be hard to predict for an attacker, but that would not be pseudo-random.

But thinking more about it, I‘m questioning the assumptions. If we want to avoid that an attacker controls where we cut files, we can do nothing about them sending a very large file filled with zeroes. It will very likely never get cut, whatever the RH algorithm.

espadrine avatar Jun 12 '20 18:06 espadrine

I think I have something now in #8903, please review!

ThomasWaldmann avatar Jun 08 '25 17:06 ThomasWaldmann

@espadrine What you said was a bit above my head. I'm not a mathematician nor a cryptographer, just a interested nerd (and I did take a crypto course some years ago).

But I tried to understand what you said and if I understood it correctly, that construction has the problem that e.g. block contents at the beginning of the file influence the value of all Ci after it. Changing the contents of that block also changes all Ci after it.

But for a chunker, the chunking decision should be only based on the windows content and the cutting point should not change if a block from far before the window changes.

ThomasWaldmann avatar Jun 08 '25 18:06 ThomasWaldmann