wyng-backup icon indicating copy to clipboard operation
wyng-backup copied to clipboard

Mitigate chunking attacks

Open DemiMarie opened this issue 8 months ago • 2 comments

Apparently content-defined chunking can lead to an information leak: https://eprint.iacr.org/2025/532.pdf.

DemiMarie avatar Mar 31 '25 00:03 DemiMarie

@DemiMarie This is interesting. I've only gotten a few pages in, but this premise is important:

Content-defined chunking schemes are chunking schemes where the breakpoints are determined by the content of the surrounding data

However, Wyng's chunking breakpoints are strictly offset-defined... its very simple and where the chunks are split is completely insensitive to the data content. (This is not to confuse chunking with addressing... establishing identity for chunks by hashing them and then referencing chunks by those hashes is content addressing.)

I don't want to give the impression I think Wyng's encryption isn't affected by this or some related technique... I would like to read more. But any time the data set is 1. an update-able database, and 2. reaching for multiple levels of efficiency, then there are going to be compromises that make it less secure – at least in theory – than a data set encoded in a monolithic way.

tasket avatar Mar 31 '25 03:03 tasket

@DemiMarie This is interesting. I've only gotten a few pages in, but this premise is important:

Content-defined chunking schemes are chunking schemes where the breakpoints are determined by the content of the surrounding data

However, Wyng's chunking breakpoints are strictly offset-defined... its very simple and where the chunks are split is completely insensitive to the data content. (This is not to confuse chunking with addressing... establishing identity for chunks by hashing them and then referencing chunks by those hashes is content addressing.)

I’m glad that at least this issue doesn’t affect Wyng!

I don't want to give the impression I think Wyng's encryption isn't affected by this or some related technique... I would like to read more. But any time the data set is 1. an update-able database, and 2. reaching for multiple levels of efficiency, then there are going to be compromises that make it less secure – at least in theory – than a data set encoded in a monolithic way.

Indeed, Wyng will always leak at least some data (such as the amount of data that has changed between backups), and this is both unavoidable and (for the vast majority of users) a small price to pay for much faster (and therefore more frequent) backups. Still, it’s always better to be aware of known attacks.

DemiMarie avatar Mar 31 '25 05:03 DemiMarie