xous-core
xous-core copied to clipboard
spooky `ct_eq` problem in Rust 1.72
As of Rust 1.72, at least one instance of ct_eq
is being optimized out and causing incorrect behavior.
The specific instance is here:
https://github.com/betrusted-io/xous-core/blob/8a67c95d70c4699b0d8ac01a49d21ae3791e72a5/apps/vault/src/ctap/client_pin.rs#L177
I have tried with both lto = "fat" and lto = "off"; this made no difference. Also tried toggling debug symbols on and off, and a few other things like XIP out of ROM versus running out of RAM. None of these made a difference.
The only three things I've found to affect it so far are:
- Rust version (1.72 vs older)
- Optimization level (works with
opt="s"
; fails with the default optimizer (I believe this is '3')) - Whether or not the result of the
ct_eq
operation is printed to a debug statement
The last item basically means if I add this line of code
if !bool::from(pin_hash.ct_eq(&pin_hash_dec)) {
// this lines ensures that the equality check above works in Rust v1.72.
log::trace!("ct_eq: {:?}", bool::from(pin_hash.ct_eq(&pin_hash_dec)));
The ct_eq
call works.
This issue is opened to track this problem. It is spooky because we rely on ct_eq
for cryptographic correctness, and there are many places where we rely on the value of the output to make critical security decisions. For now, the log::trace!
work-around has been committed, but eventually we should see if this gets fixed in Rust by removing that debug statement and re-testing...
I generated listings for version of vault
with and without the debug statement fix. When the debug statement is present to consume the result of the ct_eq
call, ct_eq()
is actually called. Without it, that snippet of code is missing.
This is what the listing looks like with the debug statement: https://github.com/betrusted-io/xous-core/blob/d94ceb8115ff8917c0b5aaa98f20a82e0ea2d257/ct_eq.txt#L357-L408
This is what the listing looks like without it: https://github.com/betrusted-io/xous-core/blob/d94ceb8115ff8917c0b5aaa98f20a82e0ea2d257/ct_eq.txt#L781-L800
Notably, this call is missing when we don't have the debug statement:
aa54e: 494080e7 jalr 1172(ra) # ae9de <<[T] as subtle::ConstantTimeEq>::ct_eq>
Something does not seem right.
The function call is missing because it is inlined. Tracking backwards through the assembly, I can't find any miscompilations:
The code inside the if
block starts at aa46a
, which is only reachable from this beqz
in aa43c
. The branch is reached
a. if the length comparison in aa244
fails
b. by a jump in aa3d2
after a long code block that compares the 16 bytes referenced by s4
with 16 words previously copied from the input array through a hacky combination of xor
and seqz
.
These two instances correspond exactly to the ct_eq
function, with LLVM being at the same time weird enough to needlessly copy the input array to the stack and smart enough to lower the sign-bit-as-zero-check obfuscation in ct_eq
to a single seqz
.
Note that the working version also has an inlined ct_eq
, the result of the call to ct_eq
is only used for the trace
output and not for the actual logic (there is no jump or branch between the call and the call to regenerate
).
All the other points in the path can also be found:
- the
decrypt
call via adyn SharedSecret
is thejalr
inaa232
- the
decr_pin_retries
call is inlined, but the store is queried with the right key inaa1bc
-aa1c6
- the
if
checking consecutive mismatches is done inaa1b6
- the
pin_hash
call is inlined, but the store is queried with the right key inaa0c2
-aa0cc
Could you describe the misbehaviour you found? Perhaps there is something I missed, so knowing what actually goes wrong would help.
What seems to happen is the result of the ct_eq
is always false, whether or not the hashes match. However, it's a bit tricky to definitively prove this because any attempt to print the data immediately before or after the ct_eq
call causes it to actually be executed.
So, the behavior I observed is consistent with your analysis of an in-lined function simply not being generated, assuming the default value of the register evaluates to false: skipping the call would mean the register has the default false value, even if the data values are equal.
The way I can infer that the data values do match is that in every other setting/combination of the system, for this test, the incoming data values do match. This call is part of an automated test bench, and it's a check that always evaluates to the two values being true in every other situation.
There is some possibility that something is corrupting the arrays so they aren't equal due to some other optimization problem (and that optimization problem disappears when you try to print the arrays), but the simplest explanation that is also consistent with your analysis would be that the in-lined function was simply omitted.
I assume you never tried any other architecture, or you can recreate the ct_eq
problem outside of OpenSK (i.e., in an MVP)?
That's correct, I have not tried to re-create this on any other architecture.