cryptosym icon indicating copy to clipboard operation
cryptosym copied to clipboard

What happens at round 17?

Open aljungberg opened this issue 4 years ago • 0 comments

Love your work here. Exactly the kind of stuff I find fascinating to obsess about even that it's "impossible" to solve.

You wrote,

Something happens in the 17th round that causes a major spike in complexity. My best guess is that an AND gate between bits "very far apart" in the computation is the issue. If you look at the maximum distance between the indices of random variable bits which are inputs to an AND gate, the maximum gap is around 126,000 bits apart up to 16 rounds. At 17 rounds, the gap increases to 197,000 bits. For a SAT solver, this could mean that it doesn't detect an invalid variable assignment until values have been propagated a long way. For the full 64-round SHA-256, the maximum gap is 386,767.

This sounds plausible to me, and I'd just like to, as someone who knows very little about this subject but is curious enough to think about it a little, propose a mechanism for this.

In the first 16 rounds the SHA-256 algorithm is still just ingesting straight words from the message. At each round it takes the next word from the unmodified input message.

If you were to look at the equation for each internal bit variable it would be almost entirely linear. The only source of non-linearity comes from CH and MAJ in the compression function but since they only affect 2 words per round it's like that non-linearity as a function of the input is spreading only very "slowly" through the internal state. Like you wrote, the "distance" between significant AND gates is still small.

(Side note: in your case since you convert XOR to AND, I guess there's actually a bunch of additional apparent non-linearity in the equations you feed to the SAT solver, but it's "fake" non-linearity, in some sense, which especially CryptoMiniSat just might "see through" since it wants to use its hallmark Jordan Gauss elimination.)

Then look at what happens at round 17: we take the first word from the message schedule that wasn't a straight copy from the input message. It mixes bits from 4 different input words using rotations and 32 bit addition. At the boolean bit vector level we now have true non-linearity from the input since we have to use AND gates to implement that carry flag. And it's "long distance" since word 17 mixes in input all the way back from the 1st word.

So that's my take on the complexity explosion. (I could be completely wrong, something about hashing functions just is so inherently unintuitive to me I keep catching myself believing they can't actually work at all. It's like... they have to be reversible, right? Just haven't seen the trick.)

aljungberg avatar Aug 20 '21 12:08 aljungberg