skey test failure from sph_dec32le_aligned UB
Hi!
Downstream in Gentoo, toralf reported a test failure for johntheripper-jumbo (at commit b27f951a8e191210685c8421c90ca610cdd39dce; I've checked subsequent commits and they shouldn't affect this) looking like:
Testing: skey, S/Key [MD4/MD5/SHA1/RMD160 32/64]... FAILED (cmp_all(1))
[...]
1 out of 758 tests have FAILED
* ERROR: app-crypt/johntheripper-jumbo-1.9.0_p20250703::gentoo failed (test phase):
* (no error message)
toralf hit this with trunk GCC (so to-be-16) with -O2 -pipe -march=native, where -march=native is whatever it returns (vagueness for a reason, see below) on a AMD Ryzen 9 5950X.
I couldn't reproduce that with -march=znver2 or -march=znver3 (or -march=native) on my 3950X but decided not to worry about that, because I could reproduce with -O3 -flto and just debugged that instead. It also passes with -fno-strict-aliasing.
Running the failing test under Valgrind:
/var/tmp/portage/app-crypt/johntheripper-jumbo-1.9.0_p20250703/work/john-b27f951a8e191210685c8421c90ca610cdd39dce/test # valgrind ../run/john --config=john.conf --test --format=skey==164563== Memcheck, a memory error detector
==164563== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==164563== Using Valgrind-3.26.0.GIT and LibVEX; rerun with -h for copyright info
==164563== Command: ../run/john --config=john.conf --test --format=skey
==164563==
--164563-- Warning: <14704> DW_TAG_subprogram with no DW_AT_name and no DW_AT_specification or DW_AT_abstract_origin in /var/tmp/portage/app-crypt/johntheripper-jumbo-1.9.0_p20250703/work/john-b27f951a8e191210685c8421c90ca610cdd39dce/run/john
--164563-- Warning: zero subprog, missing DW_AT_abstract_origin in DW_TAG_inlined_subroutine in /var/tmp/portage/app-crypt/johntheripper-jumbo-1.9.0_p20250703/work/john-b27f951a8e191210685c8421c90ca610cdd39dce/run/john
Benchmarking: skey, S/Key [MD4/MD5/SHA1/RMD160 32/64]... ==164563== Conditional jump or move depends on uninitialised value(s)
==164563== at 0x44C6C0A: is_key_right.part.0 (formats.c:596)
==164563== by 0x44CCFE9: UnknownInlinedFun (formats.c:1312)
==164563== by 0x44CCFE9: UnknownInlinedFun (formats.c:1329)
==164563== by 0x44CCFE9: fmt_self_test (formats.c:2079)
==164563== by 0x44C3673: benchmark_all (bench.c:882)
==164563== by 0x44DB627: UnknownInlinedFun (john.c:1684)
==164563== by 0x44DB627: main (john.c:2110)
==164563==
FAILED (cmp_all(1))
I won't bore you with the snaking through, but it ended up being that formats.o and SKEY_fmt_plug.o were entirely innocent and the issue was in ripemd.o. In the end, it was ripemd160_round, specifically its sph_dec32le_aligned call from sph_types.h at https://github.com/openwall/john/blob/cb0c337e7ffbe235f55c5c7a57fcab5010d0c0ba/src/sph_types.h#L1621-L1625.
sph_dec32le_aligned breaks C aliasing rules by reading unsigned char* as sph_u32*. You can alias anything with unsigned char*, but not the other way around.
Marking sph_u32 (and friends) with __attribute__((may_alias)) fixes the issue for me, though that's not standard C.
(Returning to toralf's environment: given the specifics of the problem in the end and where the aliasing happens (in the same TU), I'm reasonably confident it's the same problem that he hit given he wasn't using LTO.)
It looks like @solardiz noticed the aliasing issues over there in the past: https://github.com/openwall/john/issues/4468#issuecomment-730556873
Thank you very much for reporting and investigating this and even suggesting a fix @thesamesam!
Marking
sph_u32(and friends) with__attribute__((may_alias))fixes the issue for me, though that's not standard C.
We may, within #ifdef __GNUC__ (or checking for specific GCC versions, but I think this attribute is pretty old?). Another idea I have is to turn these context struct fields:
unsigned char buf[64]; /* first field, for alignment */
into anonymous unions like this:
union {
unsigned char buf[64]; /* first field, for alignment */
sph_u32 buf_u32[16];
};
I didn't test it, and I don't know if it really helps here - but I guess it would?
grepping md_helper.c, I see these calls to RFUN:
RFUN(sc->buf, SPH_VAL);
RFUN(data, SPH_VAL);
RFUN(sc->buf, SPH_VAL);
RFUN(sc->buf, SPH_VAL);
In the one call that passes data, it's a const void *data argument of a non-static function, so shouldn't let the compiler make assumptions without LTO.
@thesamesam Would you like to try these approaches at fixing this issue, and send us a PR implementing one of them?
Thanks again!
BTW, I think I saw us already use anonymous structs or/and unions somewhere in the codebase. If so, this wouldn't reduce our code's portability (except that it would then be more work to reconsider this language feature dependency if we ever want to).
BTW, I think I saw us already use anonymous structs or/and unions somewhere in the codebase.
Confirmed by adding -Wc99-c11-compat, e.g.:
slow_hash_plug.c:127:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
127 | };
| ^
ssh_fmt_plug.c: In function ‘common_crypt_code’:
ssh_fmt_plug.c:292:26: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
292 | };
| ^
ssh_fmt_plug.c:312:26: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
312 | };
| ^
In file included from pkzip.h:11,
from zip_fmt_plug.c:32:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from dyna_salt.c:32:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from bench.c:45:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from cracker.c:49:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from formats.c:21:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from loader.c:37:
dyna_salt.h:45:10: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
45 | };
| ^
In file included from uaf_encode.c:82:
uaf_raw.h:107:6: warning: ISO C99 doesn’t support unnamed structs/unions [-Wc99-c11-compat]
107 | };
| ^
Thanks for the reply and helpful analysis. I'll see which approach fits best and send a PR. It'll likely be a few days.
I'll also check again the possibility of it w/o LTO given your remarks & also ask toralf to try a candidate patch to see if it fixes the bug he reported, or if there's two going on. It wouldn't be the first time!
My comment about LTO was only about 1 out of 4 kinds of uses - the one remaining with my suggested alternative approach at working around the issue. I didn't mean to say that triggering the original issue as you reported it depended on LTO.