ocaml-stdint icon indicating copy to clipboard operation
ocaml-stdint copied to clipboard

Segfaults with 128-bit integers on Windows

Open tahina-pro opened this issue 4 years ago • 17 comments

With my colleague @msprotz, we observed segfaults related to 128-bit integers in some Windows OCaml programs using ocaml-stdint.

It turns out that even the test executable produced by make test/stdint_test segfaults, with no test running, as we show below.

We are working on Windows 10 version 1909 (10.0.18363.720) with Cygwin 3.1.4-1, mingw64-x86_64-gcc 9.2.0, and OCaml 4.09.0 installed via OCaml and OPaM for Windows. If you want to quickly reproduce our config, you can pull a complete Docker image pulling the latest ocaml-stdint master (0785788f425bd06635608fe111a851ec86e095bf) and building ocaml-stdint and its test suite. You can also rebuild the Docker image yourself using this Dockerfile.txt.

Then, below is what we are observing:

ContainerAdministrator@cff3016f7f7e /cygdrive/c/test/ocaml-stdint
$ gdb tests/stdint_test.exe
GNU gdb (GDB) (Cygwin 8.2.1-1) 8.2.1

(gdb) run
Starting program: /cygdrive/c/test/ocaml-stdint/tests/stdint_test.exe
[New Thread 5000.0x33c4]
[New Thread 5000.0x32b0]
[New Thread 5000.0x5110]
[New Thread 5000.0x5710]

Thread 1 received signal SIGSEGV, Segmentation fault.
copy_uint128 (i=<optimized out>) at uint128_stubs.c:188
188       Uint128_val(res) = i;
(gdb) backtrace
#0  copy_uint128 (i=<optimized out>) at uint128_stubs.c:188
#1  0x0000000000494e65 in uint128_of_int (v=<optimized out>) at uint128_conv.c:34
#2  0x000000000042f2e6 in camlStdint__entry ()
#3  0x0000000000000001 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

As a workaround, my colleague @msprotz proposed to recompile ocaml-stdint with gcc optimizations disabled, which we did in tahina-pro/ocaml-stdint@99ac1fa234db8d57c4a43837ae71fda51d8d1470, and there we no longer observe these segfaults.

tahina-pro avatar Mar 18 '20 20:03 tahina-pro

Do you know if your version of mingw supports the __int128_t type? Windows docker image cannot run on Linux so I cannot repro your exact issue, but I also have a segfault when running the tests without HAVE_INT128, so maybe that's related?

rixed avatar Mar 23 '20 06:03 rixed

Yes, in a container running from the Docker image, I can successfully compile and run a __int128_t function that computes 2^(2^n):

docker run -i -t tahina/20200318ocamlstdint

ContainerAdministrator@52e7a61f5bdc /cygdrive/c/test/ocaml-stdint
$ cat <<EOF >a.c
#include <stdint.h>
#include <assert.h>
__int128_t two_pow_two_pow_minus_one(int count) {
  __int128_t res = 2;
  for (int i = 0; i < count; ++i) {
    res *= res;
  }
  assert (res > 0);
  return res - 1;
}
EOF

$ x86_64-w64-mingw32-gcc.exe -Wall -c a.c

$ cat <<EOF >test.c
#include <inttypes.h>
#include <stdio.h>
__int128_t two_pow_two_pow_minus_one(int count);
int main(void) {
  uint64_t res = (uint64_t) (two_pow_two_pow_minus_one(6));
  printf("%" PRIu64 "\n", res);
  return 0;
}
EOF

$ x86_64-w64-mingw32-gcc.exe -Wall -c test.c

$ x86_64-w64-mingw32-gcc.exe -Wall -o test.exe a.o test.o

$ ./test.exe
18446744073709551615

$ echo $?
0

tahina-pro avatar Mar 23 '20 19:03 tahina-pro

If it helps - using an older 4.08.0 installation, also using fdopen's opam distribution, I don't get a segfault. On 4.10.0 I do get it.

hcarty avatar May 21 '20 16:05 hcarty

Tried compiling the above test C progs with -O3 and they compiled/ran without error. My prior test w/ an ifdef check also worked.

rseymour avatar May 21 '20 17:05 rseymour

I should mention that the error started happening after we upgraded GCC (specifically: mingw) to version 9. We were previously on version 7 and there was no trouble.

msprotz avatar May 21 '20 21:05 msprotz

Thread 1 received signal SIGSEGV, Segmentation fault.
0x000000000044e87d in copy_int128 (i=<optimized out>) at int128_stubs.c:146
146       Int128_val(res) = i;

This gdb output seems to be saying that GCC optimized out the variable i in the copy_int128() function. I tried throwing a volatile next to it in the args to the function, but it didn't seem to stop it. Feels like gcc at -O2 shouldn't optimize out the right hand side of an assignment, but who knows.

rseymour avatar May 21 '20 22:05 rseymour

I found perl folks wrap code with a lot of guards to protect it from new GCC... https://github.com/perl11/cperl/commit/c0b59c82b053a3434cf3c20d36d1a3406421bc3d (this was a failed(?) commit to add even more -O0 style guards to the existing ones)

rseymour avatar May 26 '20 17:05 rseymour

@rseymour do you think we should such guards as well?

rgrinberg avatar Oct 03 '20 22:10 rgrinberg

We didn't end up going with them as the other fix worked, but I feel like it's the "right" way to do it until GCC fixes itself.

rseymour avatar Oct 07 '20 01:10 rseymour

Just coming back to this issue after a long while...

Are we positive that this is a GCC bug? I can't tell from the previous discussion what the "misbehavior" might be. The fact that we observe the issue at O2 and above doesn't necessarily reflect a GCC issue; it could be that the bindings code does something illegal per the C standard, and that GCC only becomes "smart" at O2.

Maybe the issue is this: when HAVE_INT128 is absent, some unspecified issue observable only at -O2 and above causes a segfault.

Then there are two fixes:

  • make HAVE_INT128 enabled on windows/mingw
  • fix the underlying issue.

Is this an accurate summary?

Would you be open to the first fix?

FYI, we're still running our entire infrastructure with the workaround outlined in this bug: we compile the test program, see if it segfaults, and if it does, recompile ocaml-stdint with -O0 to prevent the error from happening.

Thanks!

Jonathan

msprotz avatar Oct 26 '20 15:10 msprotz

I just discovered I was being bitten by this too - making HAVE_INT128 is definitely not helping with gcc 9.2.0 on Cygwin 3.1.7 in a Docker container on Windows 10 1910 (10.0.18363.1139).

A brief glance suggests this should work (a very brief glance suggests mingw-w64 doesn't provide the macro at all):

diff --git a/lib/int128.h b/lib/int128.h
index 194e1ae..1db352d 100644
--- a/lib/int128.h
+++ b/lib/int128.h
@@ -1,7 +1,7 @@
 #ifndef OCAML_INT128_H
 #define OCAML_INT128_H

-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
 #define HAVE_INT128
 typedef __int128_t int128;
 #else
diff --git a/lib/uint128.h b/lib/uint128.h
index 3f97dab..864ea40 100644
--- a/lib/uint128.h
+++ b/lib/uint128.h
@@ -1,7 +1,7 @@
 #ifndef OCAML_UINT128_H
 #define OCAML_UINT128_H

-#if defined(__SIZEOF_INT128__)
+#if defined(__SIZEOF_INT128__) || defined(__MINGW32__)
 #define HAVE_UINT128
 typedef __uint128_t uint128;
 #else

dra27 avatar Oct 26 '20 17:10 dra27

This and forcing -O0 is working fine, for an unoptimized value of "fine"!

dra27 avatar Oct 26 '20 17:10 dra27

Your solution seems fine to me @dra27 -- do I understand you correctly that we need both HAVE_INT128 and -O0? Definitely something fishy going on here but I agree that as long as the issue is mitigated it's "fine" for us too.

Thanks for chiming in! Delightful to see you jump in on a Windows-only obscure optimization-related bug ;-).

msprotz avatar Oct 26 '20 17:10 msprotz

I haven't tested if -O0 on its own is sufficient, no (it looks like it probably will be). I was going on the assumption that intrinsic support for __int128_t is better, so it seemed better to have that by default, but then I also discovered that it was still segfaulting...

A glorious coincidence to have jumped in within hours of you - I just happen to be hacking cap'n proto on mingw-w64 at the moment! 🙂

dra27 avatar Oct 26 '20 21:10 dra27

May I suggest adding a reference to this issue in a comment in the patch itself? I think actual comments are better than relying on the commit message in instances like these.

rixed avatar Oct 27 '20 09:10 rixed

I should have had a more in-depth glance, __SIZEOF_INT128__, is provided.

dra27 avatar Oct 27 '20 13:10 dra27

OK, I've dug a bit further. The problem is alignment, and it's not Windows-specific, I expect it's just not been seen on Linux or elsewhere yet. Here's the relevant assembly snippet for copy_int128:

       leaq    int128_ops(%rip), %rcx
       movq    280(%rax), %rsi
       call    caml_alloc_custom
       movaps  %xmm6, 8(%rax)

however, this assumes that 8(%rax) is 16-byte aligned and there's absolutely no guarantee of that, as OCaml only guarantees 8-byte alignment on a 64-bit platform. If you "do something" with the 128 bit int, then gcc stops using xmm6 to hold it (I found that by coincidence, since a printf of the two 64-bit components of it "fixes" it!).

Unfortunately, this cast is invalid:

#define Int128_val(v) (*((int128 *)Data_custom_val(v)))

For whatever reason, in 4.07.1 it appears that the allocator much more readily allocated 16-byte aligned custom values (I think this is simply an image layout coincidence).

dra27 avatar Oct 27 '20 14:10 dra27