john
john copied to clipboard
Sync md[45].[ch] with upstream
Jumbo uses a revision of my generic MD4 and MD5 implementations from some years back. I've since revised the upstream versions in several ways. Jumbo also made its changes. We should probably take my latest revisions and reintroduce only the changes that we actually still want to have.
One change we're missing is my addition of H/H2 split to MD4, similar to what I had introduced for MD5 earlier, to help the compiler detect the common subexpressions.
This commit by JimF gets in the way: abe2cdb591b03c871f2c0d6e046cbeed36c6dee4. As I understand, it primarily renames a, b, c, d to uppercase A, B, C, D, which is only made use of in pbkdf2_hmac_md5.h. I doubt matching OpenSSL's context layout in any other ways is relevant.
Agreed - I thought I already added the H/H2 split.
I thought I already added the H/H2 split.
As I understand: to MD5 yes, to MD4 not yet.
A relevant question for us: do we still want to be able to use OpenSSL's MD[45] as well? I doubt it. While OpenSSL might provide assembly implementations, it also zeroizes memory on *_Final, which for something as fast as MD[45] probably undoes any advantage the assembly code might have provided. Also, going inside of OpenSSL implementation-specific context structs in pbkdf2_hmac_md5.h is no good - we can do that reliably for our own implementation-specific context structs, but not for third-party ones (not long-term).
So I suggest we fully standardize on our own md[45][.ch] and minimize their differences between my upstream versions and JtR jumbo versions of them.
OTOH, this also enable us to introduce new APIs such as a combined InitUpdate call (initialize new context and do a first update on it at once, thereby saving on function call overhead). Or we can simply make Init macros in .h files, but this will increase code size in the callers without any speed advantage over the combined InitUpdate approach.
Are we talking about same thing?
$ git grep H[,2] md4.c
md4.c:#define H2(x, y, z) ((x) ^ ((y) ^ (z)))
md4.c: STEP(H, a, b, c, d, GET(0) + 0x6ed9eba1, 3)
md4.c: STEP(H2, d, a, b, c, GET(8) + 0x6ed9eba1, 9)
md4.c: STEP(H, c, d, a, b, GET(4) + 0x6ed9eba1, 11)
md4.c: STEP(H2, b, c, d, a, GET(12) + 0x6ed9eba1, 15)
md4.c: STEP(H, a, b, c, d, GET(2) + 0x6ed9eba1, 3)
md4.c: STEP(H2, d, a, b, c, GET(10) + 0x6ed9eba1, 9)
md4.c: STEP(H, c, d, a, b, GET(6) + 0x6ed9eba1, 11)
md4.c: STEP(H2, b, c, d, a, GET(14) + 0x6ed9eba1, 15)
md4.c: STEP(H, a, b, c, d, GET(1) + 0x6ed9eba1, 3)
md4.c: STEP(H2, d, a, b, c, GET(9) + 0x6ed9eba1, 9)
md4.c: STEP(H, c, d, a, b, GET(5) + 0x6ed9eba1, 11)
md4.c: STEP(H2, b, c, d, a, GET(13) + 0x6ed9eba1, 15)
md4.c: STEP(H, a, b, c, d, GET(3) + 0x6ed9eba1, 3)
md4.c: STEP(H2, d, a, b, c, GET(11) + 0x6ed9eba1, 9)
md4.c: STEP(H, c, d, a, b, GET(7) + 0x6ed9eba1, 11)
md4.c: STEP(H2, b, c, d, a, GET(15) + 0x6ed9eba1, 15)
As #2356 says: My ultimate goal is to get rid of external crypt/hash lib altogether. We nearly always need our "own" code anyway, for OpenCL, so there's no point in using SSL for CPU.
Are we talking about same thing?
Ah, right. So the H/H2 is already in there. Don't know how I missed that.
Where can I find your upstream code?
Where can I find your upstream code?
https://openwall.info/wiki/people/solar/software/public-domain-source-code
But I guess I'll take care of this issue myself.
BTW, this provides some speedup for formats that copy contexts, such as SIP (as reminded by #1990), EIGRP, HSRP, tacacs-plus:
+++ b/src/md5.h
@@ -34,7 +34,9 @@ typedef struct {
MD5_u32plus A, B, C, D;
MD5_u32plus lo, hi;
unsigned char buffer[64];
+#if !ARCH_ALLOWS_UNALIGNED
MD5_u32plus block[16];
+#endif
} MD5_CTX;
extern void MD5_Init(MD5_CTX *ctx);
We should probably make this change along with the update, or make sure it's preserved through the update if we make it separately. Ditto for MD4.
do we still want to be able to use OpenSSL's MD[45] as well? I doubt it. While OpenSSL might provide assembly implementations, it also zeroizes memory on
*_Final, which for something as fast as MD[45] probably undoes any advantage the assembly code might have provided.
I had overlooked that on some archs OpenSSL supports special MD[45] instructions, which speeds up e.g. our md5crypt-long on UltraSparc T-something (I don't recall which exact CPUs have this). So we probably want to preserve ability to use OpenSSL for that reason. We're probably not going to bother implementing support for those instructions ourselves, and in places where we care about performance more we'd use SIMD anyway.
So the H/H2 is already in there. Don't know how I missed that.
Looks like while the H/H2 split for MD4 is in here, it is not in my upstream MD4 code. So I first need to get it in there (and double-check that it makes sense there). Also Wei Dai's trick #4727 is now in here (even if #if 0'ed out), but not yet in my upstream MD4.
For now, I am going to reduce the differences from upstream, not sync to the extent I'd eventually like.
While doing this, found a bug: in our changes here, we were replacing a compile-time check for specific archs that are both little-endian and allow unaligned access with a check for archs that allow unaligned access but without requirement for them being little-endian. This just happened to work for the combinations of ARCH macros we commonly set so far, but it could break if we ever encountered and fully supported a big-endian architecture allowing unaligned access. I'll fix that.