john icon indicating copy to clipboard operation
john copied to clipboard

Sync md[45].[ch] with upstream

Open solardiz opened this issue 6 years ago • 12 comments

Jumbo uses a revision of my generic MD4 and MD5 implementations from some years back. I've since revised the upstream versions in several ways. Jumbo also made its changes. We should probably take my latest revisions and reintroduce only the changes that we actually still want to have.

One change we're missing is my addition of H/H2 split to MD4, similar to what I had introduced for MD5 earlier, to help the compiler detect the common subexpressions.

This commit by JimF gets in the way: abe2cdb591b03c871f2c0d6e046cbeed36c6dee4. As I understand, it primarily renames a, b, c, d to uppercase A, B, C, D, which is only made use of in pbkdf2_hmac_md5.h. I doubt matching OpenSSL's context layout in any other ways is relevant.

solardiz avatar May 10 '19 13:05 solardiz

Agreed - I thought I already added the H/H2 split.

magnumripper avatar May 10 '19 14:05 magnumripper

I thought I already added the H/H2 split.

As I understand: to MD5 yes, to MD4 not yet.

solardiz avatar May 11 '19 10:05 solardiz

A relevant question for us: do we still want to be able to use OpenSSL's MD[45] as well? I doubt it. While OpenSSL might provide assembly implementations, it also zeroizes memory on *_Final, which for something as fast as MD[45] probably undoes any advantage the assembly code might have provided. Also, going inside of OpenSSL implementation-specific context structs in pbkdf2_hmac_md5.h is no good - we can do that reliably for our own implementation-specific context structs, but not for third-party ones (not long-term).

So I suggest we fully standardize on our own md[45][.ch] and minimize their differences between my upstream versions and JtR jumbo versions of them.

OTOH, this also enable us to introduce new APIs such as a combined InitUpdate call (initialize new context and do a first update on it at once, thereby saving on function call overhead). Or we can simply make Init macros in .h files, but this will increase code size in the callers without any speed advantage over the combined InitUpdate approach.

solardiz avatar May 11 '19 12:05 solardiz

Are we talking about same thing?

$ git grep H[,2] md4.c
md4.c:#define H2(x, y, z)                       ((x) ^ ((y) ^ (z)))
md4.c:          STEP(H, a, b, c, d, GET(0) + 0x6ed9eba1, 3)
md4.c:          STEP(H2, d, a, b, c, GET(8) + 0x6ed9eba1, 9)
md4.c:          STEP(H, c, d, a, b, GET(4) + 0x6ed9eba1, 11)
md4.c:          STEP(H2, b, c, d, a, GET(12) + 0x6ed9eba1, 15)
md4.c:          STEP(H, a, b, c, d, GET(2) + 0x6ed9eba1, 3)
md4.c:          STEP(H2, d, a, b, c, GET(10) + 0x6ed9eba1, 9)
md4.c:          STEP(H, c, d, a, b, GET(6) + 0x6ed9eba1, 11)
md4.c:          STEP(H2, b, c, d, a, GET(14) + 0x6ed9eba1, 15)
md4.c:          STEP(H, a, b, c, d, GET(1) + 0x6ed9eba1, 3)
md4.c:          STEP(H2, d, a, b, c, GET(9) + 0x6ed9eba1, 9)
md4.c:          STEP(H, c, d, a, b, GET(5) + 0x6ed9eba1, 11)
md4.c:          STEP(H2, b, c, d, a, GET(13) + 0x6ed9eba1, 15)
md4.c:          STEP(H, a, b, c, d, GET(3) + 0x6ed9eba1, 3)
md4.c:          STEP(H2, d, a, b, c, GET(11) + 0x6ed9eba1, 9)
md4.c:          STEP(H, c, d, a, b, GET(7) + 0x6ed9eba1, 11)
md4.c:          STEP(H2, b, c, d, a, GET(15) + 0x6ed9eba1, 15)

magnumripper avatar May 11 '19 13:05 magnumripper

As #2356 says: My ultimate goal is to get rid of external crypt/hash lib altogether. We nearly always need our "own" code anyway, for OpenCL, so there's no point in using SSL for CPU.

magnumripper avatar May 11 '19 13:05 magnumripper

Are we talking about same thing?

Ah, right. So the H/H2 is already in there. Don't know how I missed that.

solardiz avatar May 11 '19 13:05 solardiz

Where can I find your upstream code?

magnumripper avatar Sep 23 '20 12:09 magnumripper

Where can I find your upstream code?

https://openwall.info/wiki/people/solar/software/public-domain-source-code

But I guess I'll take care of this issue myself.

solardiz avatar Sep 23 '20 12:09 solardiz

BTW, this provides some speedup for formats that copy contexts, such as SIP (as reminded by #1990), EIGRP, HSRP, tacacs-plus:

+++ b/src/md5.h
@@ -34,7 +34,9 @@ typedef struct {
        MD5_u32plus A, B, C, D;
        MD5_u32plus lo, hi;
        unsigned char buffer[64];
+#if !ARCH_ALLOWS_UNALIGNED
        MD5_u32plus block[16];
+#endif
 } MD5_CTX;
 
 extern void MD5_Init(MD5_CTX *ctx);

We should probably make this change along with the update, or make sure it's preserved through the update if we make it separately. Ditto for MD4.

solardiz avatar Nov 24 '20 13:11 solardiz

do we still want to be able to use OpenSSL's MD[45] as well? I doubt it. While OpenSSL might provide assembly implementations, it also zeroizes memory on *_Final, which for something as fast as MD[45] probably undoes any advantage the assembly code might have provided.

I had overlooked that on some archs OpenSSL supports special MD[45] instructions, which speeds up e.g. our md5crypt-long on UltraSparc T-something (I don't recall which exact CPUs have this). So we probably want to preserve ability to use OpenSSL for that reason. We're probably not going to bother implementing support for those instructions ourselves, and in places where we care about performance more we'd use SIMD anyway.

solardiz avatar Nov 24 '24 02:11 solardiz

So the H/H2 is already in there. Don't know how I missed that.

Looks like while the H/H2 split for MD4 is in here, it is not in my upstream MD4 code. So I first need to get it in there (and double-check that it makes sense there). Also Wei Dai's trick #4727 is now in here (even if #if 0'ed out), but not yet in my upstream MD4.

solardiz avatar Nov 24 '24 02:11 solardiz

For now, I am going to reduce the differences from upstream, not sync to the extent I'd eventually like.

While doing this, found a bug: in our changes here, we were replacing a compile-time check for specific archs that are both little-endian and allow unaligned access with a check for archs that allow unaligned access but without requirement for them being little-endian. This just happened to work for the combinations of ARCH macros we commonly set so far, but it could break if we ever encountered and fully supported a big-endian architecture allowing unaligned access. I'll fix that.

solardiz avatar Nov 24 '24 03:11 solardiz