john icon indicating copy to clipboard operation
john copied to clipboard

Setup automated Windows builds/CI using Cygwin on ReactOS

Open solardiz opened this issue 8 years ago • 38 comments

I suggested this before in https://github.com/magnumripper/JohnTheRipper/issues/2372#issuecomment-270007448 but at the time I thought that ReactOS wouldn't run Cygwin.

It appears that since mid-2016 ReactOS is finally able to run Cygwin:

https://twitter.com/reactos/status/735141249775194112 https://jira.reactos.org/browse/CORE-7739

so maybe we should proceed with that setup, in a VM. Or maybe two setups: 32-bit and 64-bit.

Also relevant is this comment from the above ReactOS JIRA entry:

Carlo Bramini added a comment - 2016-12-08 12:35

I think it would be worth to remind that CYGWIN does not support anymore XP/2003.
Last version expected to work is 2.5.2, but keep in mind that it is not binary compatible with recent packages, so replacing just cygwin1.dll won't work.

Since we probably still want our JtR builds to work on XP/2003, perhaps we should stick to Cygwin <=2.5.2.

solardiz avatar May 14 '17 17:05 solardiz

AppVeyor (which is another CI service like Travis CI) provides CI services on Windows platform. It comes with MinGW, MSYS, and Cygwin pre-installed.

See https://www.appveyor.com/docs/build-environment/ for details.

kholia avatar May 26 '17 05:05 kholia

To clarify: this issue isn't only about testing, but also (and maybe even more importantly) about making up-to-date Windows builds available to the users. Last time this latter aspect was brought up on john-users, @kholia said the automated builds with MinGW were not recommended for end-users. So we need such builds with Cygwin instead, which we would be able to recommend.

solardiz avatar Aug 07 '17 22:08 solardiz

Thanks to @claudioandre and @magnumripper, our automated MinGW builds work well now. The only remaining problem with them is that they are not fully optimized (only SSE2, no SSE4, no AVX2 IIRC).

I believe that we can generated optimized builds (e.g. with AVX2 support) using the existing CI infrastructure but I am not sure how compatible will such builds be for end-users. Perhaps we can publish multiple binaries (e.g. john-sse2, john-avx2) in a single build to solve this problem.

kholia avatar Aug 08 '17 03:08 kholia

I am going to try https://github.com/magnumripper/JohnTheRipper/wiki/Fallback-binary-chains for MinGW builds too. Once this experiment is successful, we can recommend the MinGW builds to end-users (and close this task).

kholia avatar Sep 02 '17 10:09 kholia

Testing AVX + AVX2 MinGW builds on Windows 10 64-bit,

With AVX, all formats work fine.

> ..\run\john.exe --test --format=dmg
Will run 4 OpenMP threads
Benchmarking: dmg, Apple DMG [PBKDF2-SHA1 256/256 AVX2 8x 3DES/AES]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    26923 c/s

AVX2 stuff seems to be working for some formats.

>..\run\john.exe --list=build-info
Version: v3-3278-g78b5542+
Build: mingw32 64-bit AVX2-ac OMP
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
CPU fallback binary: john-avx.exe
$JOHN is ..\run\
Format interface version: 14
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 6.3.0
Crypto library: OpenSSL
OpenSSL library version: 01000208f
OpenSSL 1.0.2h  3 May 2016
GMP library version: 6.1.1
File locking: NOT supported by this build - do not run concurrent sessions!
fseek(): fseeko64
ftell(): ftello64
fopen(): fopen64
memmem(): JtR internal

Working formats -> md5crypt with AVX2, LM with AVX2, NT with AVX2

Non-working formats -> tripcode, descrypt, bsdicrypt (all with AVX2)

$ gdb --args ../run/john --test --format=descrypt # On Windows 10 using MinGW-w64


(gdb) r
...
Thread 1 received signal SIGSEGV, Segmentation fault.
DES_bs_crypt_25._omp_fn.0 () at DES_bs_b.c:1372
1372                    s1(x(0), x(1), x(2), x(3), x(4), x(5),

(gdb) list
1367                    rounds_and_swapped = 8;
1368                    iterations = 25;
1369
1370    start:
1371                    for_each_depth()
1372                    s1(x(0), x(1), x(2), x(3), x(4), x(5),
1373                            z(40), z(48), z(54), z(62));
1374                    for_each_depth()
1375                    s2(x(6), x(7), x(8), x(9), x(10), x(11),
1376                            z(44), z(59), z(33), z(49));

(gdb) bt
#0  DES_bs_crypt_25._omp_fn.0 () at DES_bs_b.c:1372
#1  0x00000000636057e2 in GOMP_parallel (
    fn=0x40be05 <DES_bs_crypt_25._omp_fn.0>, data=0x186d82c, num_threads=4,
    flags=0) at ../../../libgomp/parallel.c:168
#2  0x000000000040e578 in DES_bs_crypt_25 (keys_count=keys_count@entry=1)
    at DES_bs_b.c:1336
#3  0x000000000040228e in crypt_all (pcount=<optimized out>,
    salt=<optimized out>) at DES_fmt.c:209
#4  0x00000000007011d8 in is_key_right (
    format=format@entry=0x702020 <fmt_DES>, index=index@entry=0,
    binary=binary@entry=0x3a18dd4,
    ciphertext=ciphertext@entry=0x90b280 <out> "CCNf8Sbh3HDfQ",
    plaintext=plaintext@entry=0x783081 "U*U*U*U*",
    is_test_fmt_case=is_test_fmt_case@entry=0, dbsalt=dbsalt@entry=0x3acf790)
    at formats.c:208
#5  0x000000000066e4a3 in is_key_right (dbsalt=0x3acf790, is_test_fmt_case=0,
    plaintext=0x783081 "U*U*U*U*", ciphertext=0x90b280 <out> "CCNf8Sbh3HDfQ",
    binary=0x3a18dd4, index=0, format=0x702020 <fmt_DES>) at formats.c:885
#6  fmt_self_test_body (full_lvl=<optimized out>, db=0x0,
    salt_copy=<optimized out>, binary_copy=<optimized out>,
    format=0x702020 <fmt_DES>) at formats.c:901
#7  fmt_self_test (format=format@entry=0x702020 <fmt_DES>, db=<optimized out>,
    db@entry=0x3a18cb0) at formats.c:1667

(gdb) x/16i $rip
=> 0x40c182 <DES_bs_crypt_25._omp_fn.0+893>:    vmovdqa %ymm2,0x110(%rsp)
   0x40c18b <DES_bs_crypt_25._omp_fn.0+902>:    mov    0x7808(%rdx),%r9
   0x40c192 <DES_bs_crypt_25._omp_fn.0+909>:    vmovdqa (%r9),%ymm3
   0x40c197 <DES_bs_crypt_25._omp_fn.0+914>:    lea    0x88c0(%rdx),%r9
   0x40c19e <DES_bs_crypt_25._omp_fn.0+921>:    vpxor  0x20(%rsi),%ymm3,%ymm4
   0x40c1a3 <DES_bs_crypt_25._omp_fn.0+926>:    vmovdqa %ymm4,0xf0(%rsp)
   0x40c1ac <DES_bs_crypt_25._omp_fn.0+935>:    mov    0x7810(%rdx),%r10
   0x40c1b3 <DES_bs_crypt_25._omp_fn.0+942>:    vmovdqa (%r10),%ymm5

(gdb) p/x $rsp
$1 = 0x186d620

(gdb) p/x $rsp+0x110
$11 = 0x186d730 <== the address seems to be aligned just fine

(gdb) x/16bx 0x186d730
0x186d730:      0x30    0x8f    0xa1    0x03    0x00    0x00    0x00    0x00
0x186d738:      0x04    0x00    0x00    0x00    0x00    0x00    0x00    0x00

(gdb) x/16i $rip-32
   0x40c162 <DES_bs_crypt_25._omp_fn.0+861>:    mov    %ebx,%edx
   0x40c164 <DES_bs_crypt_25._omp_fn.0+863>:    mov    %r11d,0x6c(%rsp)
   0x40c169 <DES_bs_crypt_25._omp_fn.0+868>:    add    (%rax),%rdx
   0x40c16c <DES_bs_crypt_25._omp_fn.0+871>:    mov    0x7800(%rdx),%rcx
   0x40c173 <DES_bs_crypt_25._omp_fn.0+878>:    vmovdqa (%rcx),%ymm1 <= the same instruction executed just fine earlier!
   0x40c177 <DES_bs_crypt_25._omp_fn.0+882>:    lea    0x89c0(%rdx),%rcx
   0x40c17e <DES_bs_crypt_25._omp_fn.0+889>:    vpxor  (%rsi),%ymm1,%ymm2
=> 0x40c182 <DES_bs_crypt_25._omp_fn.0+893>:    vmovdqa %ymm2,0x110(%rsp)
   0x40c18b <DES_bs_crypt_25._omp_fn.0+902>:    mov    0x7808(%rdx),%r9
   0x40c192 <DES_bs_crypt_25._omp_fn.0+909>:    vmovdqa (%r9),%ymm3

I don't know what is causing this crash!

@solardiz @magnumripper I need your help in debugging this descrypt crash which occurs in MinGW AVX2 builds.

kholia avatar Sep 02 '17 12:09 kholia

$11 = 0x186d730 <== the address seems to be aligned just fine

No, it is not. AVX2 needs 32-byte alignment, so ...20 or ...40 would be OK, but ...30 is not.

solardiz avatar Sep 02 '17 13:09 solardiz

I see, my math was quite off!

Can this misalignment explain why the AVX2 build is crashing?

Is there something we can do to ensure correct alignment of the data buffer?

kholia avatar Sep 02 '17 13:09 kholia

Can this misalignment explain why the AVX2 build is crashing?

Of course, this is why it is crashing.

Is there something we can do to ensure correct alignment of the data buffer?

It's not "the data buffer" - it's something else gcc put on the stack, maybe spilling a register.

You can try -mpreferred-stack-boundary=5, but I doubt it'd help because the stack is properly aligned as it is, but this gcc build looks broken in that it spills(?) onto an unaligned stack location even when the stack is aligned. Yet maybe gcc reuses this same setting elsewhere, so it might help.

solardiz avatar Sep 02 '17 13:09 solardiz

It seems that this GCC option does not work with MinGW stuff,

$ cat config.log
...
configure:4223: checking whether the C compiler works
configure:4245: x86_64-w64-mingw32-gcc -mavx2 -mpreferred-stack-boundary=5 conftest.c  >&5
conftest.c:1:0: error: -mpreferred-stack-boundary is not supported for this target

kholia avatar Sep 02 '17 14:09 kholia

The AVX2 john binary works fine under Wine for descrypt, bsdicrypt and md5crypt (surprising) but it crashes for LM.

$ wine john.exe --test
...
Benchmarking: LM [DES 256/256 AVX2-16]... (4xOMP) 
wine: Unhandled page fault at address 0x409798 (thread 007b), starting debugger...
Register dump:
 rip:0000000000409798 rsp:000000000033d3c0 rbp:0000000001ac2e60 eflags:00010206 (  R- --  I   - -P- )
 rax:0000000001ac3500 rbx:0000000000000000 rcx:0000000001ac2d60 rdx:0000000001ac2b40
 rsi:000000000090dfc0 rdi:0000000001abc500  r8:0000000001ac2da0  r9:0000000001ac2920 r10:0000000001ac29e0
 r11:0000000001ac3400 r12:000000000033d4b0 r13:000000000033d490 r14:0000000001ac36c0 r15:0000000001ac35c0
Stack dump:
0x000000000033d3c0:  0000000000000000 0000000000000000
0x000000000033d3d0:  0000000000000000 0000000000000000
0x000000000033d3e0:  000000000177ff70 00000000018810f0
0x000000000033d3f0:  000000000177ff80 000000000033d3e4
0x000000000033d400:  0000000064946450 000000000033d3f0
0x000000000033d410:  000000000033d470 0000000001abdd00
0x000000000033d420:  0000000000009d00 000000000090dfc0
0x000000000033d430:  0000000064946940 000000006360efc5
0x000000000033d440:  0000000000000000 0000000000000000
0x000000000033d450:  2020002020000000 0000000000000000
0x000000000033d460:  0000000000000001 0000000000000000
0x000000000033d470:  000000000177ff70 000000006360f0d3
Backtrace:
=>0 0x0000000000409798 in john (+0x9798) (0x0000000001ac2e60)
0x0000000000409798: ldsl	0x000000000000007f(%rbp),%edi <-- I see nothing suspicious here

I am out of debugging ideas.

Do our previously published Windows builds have support for AVX2? I am guessing no.

kholia avatar Sep 02 '17 14:09 kholia

What crap version of gcc is this anyway? Can't you enforce use of something better?

magnumripper avatar Sep 02 '17 14:09 magnumripper

MinGW GCC version is mingw64-gcc-6.3.0-1.fc25.x86_64. I can try using MinGW GCC 7.2 which should be installable on our CircleCI build bot.

Update: I think in order to test MinGW GCC 7.2, we need to merge https://github.com/magnumripper/JohnTheRipper/pull/2715 first.

I don't have this version of GCC on my local machine, and my internet connectivity is not good at the moment.

kholia avatar Sep 02 '17 14:09 kholia

I'm amazed such a modern version of gcc can do something that silly.

magnumripper avatar Sep 02 '17 15:09 magnumripper

The AVX2 john binary works fine under Wine for descrypt, bsdicrypt and md5crypt (surprising) but it crashes for LM.

This suggests that stack alignment is not enforced beyond 16 bytes, either. So when both the stack is misaligned and the offsets are, the sum ends up aligned. Maybe gcc is actually being sort-of correctly smart in not aligning stack offsets if it knows the stack itself might not be aligned anyway. So basically no AVX2 support by that build of gcc.

Do our previously published Windows builds have support for AVX2? I am guessing no.

IIRC, Jim played with this before, and was putting explicit alignment attributes on local variables inside OpenMP regions for that reason. However, with the stack itself possibly misaligned, it is unclear if that would reliably help. Also, it wouldn't apply to implicit spilling. I guess part of the issue is (not) having incoming stack alignment on callbacks from Windows (such as on new thread spawning). This probably goes beyond gcc.

solardiz avatar Sep 02 '17 15:09 solardiz

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 is relevant to this topic. The problem seems complex, and also is unfixed so far.

kholia avatar Sep 02 '17 15:09 kholia

https://github.com/tpoechtrager/wclang (clang based cross-compiler) - this can be a possible workaround for this problem.

kholia avatar Sep 02 '17 16:09 kholia

Thanks to @claudioandre and @magnumripper, our automated MinGW builds work well now. The only remaining problem with them is that they are not fully optimized (only SSE2, no SSE4, no AVX2 IIRC).

What about --fork? I vaguely recall some of our users got --fork working under Windows in some Cygwin builds, so perhaps we should in our builds too, and perhaps there's no chance for that if we build with MinGW?

solardiz avatar Sep 02 '17 16:09 solardiz

Yes, MinGW does not support fork stuff IIRC.

Even if we use Cygwin, the AVX2 alignment problem with GCC + Windows combination will remain, correct?

I am not sure if Cygwin builds have any advantages over MinGW builds. I don't have much experience with Cygwin to know this.

kholia avatar Sep 03 '17 07:09 kholia

I don't quite understand how/why gcc has bugs like this on Windows only?! Perhaps it's because of the different function calling convention. Perhaps a 32-bit build would work better?

magnumripper avatar Sep 03 '17 08:09 magnumripper

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 has some details on this but I don't understand them 100%.

Yes, a 32-bit AVX2 build should work. GCC folks fixed the "alignment bug" for 32-bit AVX2 binaries. I am confirming this now.

kholia avatar Sep 03 '17 08:09 kholia

A 32-bit AVX2 build works fine. However, a 32-bit build needs 32-bit DLL files (e.g. libwinpthread-1.dll) and a 64-bit build needs 64-bit DLL files. This means that we can't mix a 32-bit AVX2 build with 64 bit AVX + SSE2 builds.

I wonder about the performance difference between 32-bit and 64-bit MinGW builds.

kholia avatar Sep 03 '17 09:09 kholia

A .bat file can perhaps be used to workaround this issue but it would be a ugly solution.

kholia avatar Sep 03 '17 09:09 kholia

I wonder about the performance difference between 32-bit and 64-bit MinGW builds.

In general it shouldn't be a big deal. For most of our code I believe the main difference is register pressure.

magnumripper avatar Sep 03 '17 09:09 magnumripper

I am OK with distributing 32-bit MinGW builds with AVX2 > AVX > SSE2 fallback chain.

kholia avatar Sep 03 '17 12:09 kholia

In general it shouldn't be a big deal. For most of our code I believe the main difference is register pressure.

Register pressure can be a big deal. AVX2 on 32-bit is probably still faster than AVX on 64-bit, but almost certainly not to the same extent as AVX2 on 64-bit would be.

Having some builds is better than none, so if we can only have 32-bit let's have those. But in the long run we definitely do need 64-bit Windows builds as well. Also for large file support, which we've been too lazy to implement for 32-bit, but which we get without effort for 64-bit.

And I'd like us to have Cygwin builds too, and to see if we can get --fork working there.

solardiz avatar Sep 08 '17 20:09 solardiz

Also for large file support, which we've been too lazy to implement for 32-bit

IIRC @jfoug (perhaps me too, I can't recall) invested some time in supporting large file support. I believe it's been mostly or fully there for years, for both win32 and Linux.

magnumripper avatar Sep 08 '17 22:09 magnumripper

Oh, you're right - looks like jumbo does have large file support. Thanks!

solardiz avatar Sep 08 '17 22:09 solardiz

Even if we use Cygwin, the AVX2 alignment problem with GCC + Windows combination will remain, correct?

It seems that 64-bit Cygwin toolchain does not suffer from this problem. I haven't actually confirmed this though.

kholia avatar Dec 05 '17 12:12 kholia

It seems that 64-bit Cygwin toolchain does not suffer from this problem.

It does not. Nor does the 32 bit, btw ;) I use both forms of cygwin in my testing.

Fork works under cygwin. Fork would build under mingw, but we NEVER got it working. The stack alignment issue hit cygwin at one time (long ago before I started using it), There were sprinkling of comments still out there. But that has long since gone away. The only caveat was in some OMP regions, there were still alignment issues, but those were worked around under cygwin.

We have had AVX2 support a long time for both 32 and 64 bit builds on cygwin. As for performance, there is a very noticeable improvement going from 32 to 64 bit. in cygwin. 10 to 30% is usual, average is probably in the 15% range. There are many formats where the speed change is pretty much nothing. There are some where it is huge. One other big key, is speed of oSSL. The oSSL builds for 64 bit are quite a bit faster. They use MMX/SSE for 64 bit registers, etc, speeding up a lot of stuff, while I am not sure that is done in the 32 bit builds. So yes, there is a noticeable difference. I personally used 32 bit a long time, because not all items were supported on cygwin64. Also at that time, I was still using MinGW, and even though the 64 bit mingw was there, getting it to work was like spotting a unicorn. But then I switched over to cygwin, still built in 32 bit, but did a bit of work on 64 bit, and switched over (very easy to install, and simply works), patched a couple things in john, and have never really looked back.

jfoug avatar Dec 05 '17 13:12 jfoug

Large file support should be there for all systems supporting it. Hell, it is even there for that dinosaur sparc 32 bit system, lol (AND for Vc 32 bit). So it has been there a while. It was a combination of ./configure probing for many possible usable functions, and then packing that knowledge into a single spot in the code (common, misc, I do not remember), and then the large file stuff simply works.

Yes, for some older systems like cygnus (i.e. DOS), it probably would not be there, but anything newer than a TRS80, and it should work fine.

jfoug avatar Dec 05 '17 13:12 jfoug