xous-core icon indicating copy to clipboard operation
xous-core copied to clipboard

spooky `ct_eq` problem in Rust 1.72

Open bunnie opened this issue 1 year ago • 5 comments

As of Rust 1.72, at least one instance of ct_eq is being optimized out and causing incorrect behavior.

The specific instance is here:

https://github.com/betrusted-io/xous-core/blob/8a67c95d70c4699b0d8ac01a49d21ae3791e72a5/apps/vault/src/ctap/client_pin.rs#L177

I have tried with both lto = "fat" and lto = "off"; this made no difference. Also tried toggling debug symbols on and off, and a few other things like XIP out of ROM versus running out of RAM. None of these made a difference.

The only three things I've found to affect it so far are:

  • Rust version (1.72 vs older)
  • Optimization level (works with opt="s"; fails with the default optimizer (I believe this is '3'))
  • Whether or not the result of the ct_eq operation is printed to a debug statement

The last item basically means if I add this line of code

                if !bool::from(pin_hash.ct_eq(&pin_hash_dec)) {
                    // this lines ensures that the equality check above works in Rust v1.72.
                    log::trace!("ct_eq: {:?}", bool::from(pin_hash.ct_eq(&pin_hash_dec)));

The ct_eq call works.

This issue is opened to track this problem. It is spooky because we rely on ct_eq for cryptographic correctness, and there are many places where we rely on the value of the output to make critical security decisions. For now, the log::trace! work-around has been committed, but eventually we should see if this gets fixed in Rust by removing that debug statement and re-testing...

bunnie avatar Sep 18 '23 17:09 bunnie

I generated listings for version of vault with and without the debug statement fix. When the debug statement is present to consume the result of the ct_eq call, ct_eq() is actually called. Without it, that snippet of code is missing.

This is what the listing looks like with the debug statement: https://github.com/betrusted-io/xous-core/blob/d94ceb8115ff8917c0b5aaa98f20a82e0ea2d257/ct_eq.txt#L357-L408

This is what the listing looks like without it: https://github.com/betrusted-io/xous-core/blob/d94ceb8115ff8917c0b5aaa98f20a82e0ea2d257/ct_eq.txt#L781-L800

Notably, this call is missing when we don't have the debug statement:

aa54e: 494080e7 jalr 1172(ra) # ae9de <<[T] as subtle::ConstantTimeEq>::ct_eq>

Something does not seem right.

bunnie avatar Sep 18 '23 17:09 bunnie

The function call is missing because it is inlined. Tracking backwards through the assembly, I can't find any miscompilations:

The code inside the if block starts at aa46a, which is only reachable from this beqz in aa43c. The branch is reached

a. if the length comparison in aa244 fails b. by a jump in aa3d2 after a long code block that compares the 16 bytes referenced by s4 with 16 words previously copied from the input array through a hacky combination of xor and seqz.

These two instances correspond exactly to the ct_eq function, with LLVM being at the same time weird enough to needlessly copy the input array to the stack and smart enough to lower the sign-bit-as-zero-check obfuscation in ct_eq to a single seqz.

Note that the working version also has an inlined ct_eq, the result of the call to ct_eq is only used for the trace output and not for the actual logic (there is no jump or branch between the call and the call to regenerate).

All the other points in the path can also be found:

  • the decrypt call via a dyn SharedSecret is the jalr in aa232
  • the decr_pin_retries call is inlined, but the store is queried with the right key in aa1bc-aa1c6
  • the if checking consecutive mismatches is done in aa1b6
  • the pin_hash call is inlined, but the store is queried with the right key in aa0c2-aa0cc

Could you describe the misbehaviour you found? Perhaps there is something I missed, so knowing what actually goes wrong would help.

joboet avatar Oct 01 '23 21:10 joboet

What seems to happen is the result of the ct_eq is always false, whether or not the hashes match. However, it's a bit tricky to definitively prove this because any attempt to print the data immediately before or after the ct_eq call causes it to actually be executed.

So, the behavior I observed is consistent with your analysis of an in-lined function simply not being generated, assuming the default value of the register evaluates to false: skipping the call would mean the register has the default false value, even if the data values are equal.

The way I can infer that the data values do match is that in every other setting/combination of the system, for this test, the incoming data values do match. This call is part of an automated test bench, and it's a check that always evaluates to the two values being true in every other situation.

There is some possibility that something is corrupting the arrays so they aren't equal due to some other optimization problem (and that optimization problem disappears when you try to print the arrays), but the simplest explanation that is also consistent with your analysis would be that the in-lined function was simply omitted.

bunnie avatar Oct 02 '23 04:10 bunnie

I assume you never tried any other architecture, or you can recreate the ct_eq problem outside of OpenSK (i.e., in an MVP)?

kaczmarczyck avatar Nov 01 '23 13:11 kaczmarczyck

That's correct, I have not tried to re-create this on any other architecture.

bunnie avatar Nov 01 '23 16:11 bunnie