Arraymancer icon indicating copy to clipboard operation
Arraymancer copied to clipboard

segfault with pca without -d:danger or -d:release

Open brentp opened this issue 4 years ago • 5 comments

I am getting a segfault with pca, but only when built without release and without danger.

with gdb, I see:

Thread 16 "somalier" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff9c7b1700 (LWP 4913)]
nimFrame (s=0x7fff9c7b0690) at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@stensor@soperators_blas_l1.nim.c:359
359			(*s).calldepth = (NI16)((*framePtr__HRfVMH3jYeBJz6Q6X9b6Ptw).calldepth + ((NI16) 1));
(gdb) bt
#0  nimFrame (s=0x7fff9c7b0690) at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@stensor@soperators_blas_l1.nim.c:359
#1  check_size__A1o8pjA8sSUzNxmn3BamlAp_checks (a=a@entry=0x7ffffffdf2c0, b=b@entry=0x7fff9c7b0cd0)
    at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@stensor@soperators_blas_l1.nim.c:1717
#2  0x00005555556a2045 in pluseq___bwzvgAiJVLEdRKerBiTtXA._omp_fn.0 ()
    at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@stensor@soperators_blas_l1.nim.c:1893
#3  0x00007ffff7e32e96 in GOMP_parallel () from /lib/x86_64-linux-gnu/libgomp.so.1
#4  0x00005555556a8f7b in pluseq___bwzvgAiJVLEdRKerBiTtXA (a=0x7ffffffdf2c0, b=b@entry=0x7fff9c7b0cd0)
    at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@stensor@soperators_blas_l1.nim.c:1848
#5  0x00005555556de762 in sum__Y49asUKPCVhBx9cvcl0dU9blA._omp_fn.0 () at /home/brentp/.cache/nim/somalier_d/@m..@s..@[email protected]@spkgs@sarraymancer-@hhead@[email protected]:1184
#6  0x00007ffff7e3c31e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
#7  0x00007ffff7e07669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#8  0x00007ffff7d2f323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

if that's not helpful I can try to get means to recreate (from somalier)

brentp avatar Apr 28 '20 21:04 brentp

Looking at the stacktrace we have this

(*s).calldepth = (NI16)((*framePtr__HRfVMH3jYeBJz6Q6X9b6Ptw).calldepth + ((NI16) 1));

NI16 is int16 and can only hold integer up to 16384. It probably is crashing on an overflow error. How big was the tensor? I'm surprised that the calldepth could reach that much, there shouldn't be any recursion in the sum functions mentioned. https://github.com/mratsim/Arraymancer/blob/fe896870f8a67f961a930f832af72354f32c3da2/src/tensor/aggregate.nim#L27-L35

nimFrame are not inserted in release mode hence it doesn't appear there. It should also disappear with --stacktraces:off (which push/pop probably being the proper fix) and maybe with --overflowChecks:off

mratsim avatar Apr 28 '20 23:04 mratsim

with --stackTrace:off I get:

Error: unhandled exception: /home/brentp/.nimble/pkgs/arraymancer-0.6.0/tensor/selectors.nim(218, 26) `dstSlice`gensym34935440[axis].a == size`gensym34935437`  [AssertionDefect]

brentp avatar Apr 29 '20 02:04 brentp

that's occurring in the code that's using the new fancy indexing, so I assume that's corrupting memory and then the error is appearing later (?).

brentp avatar Apr 29 '20 02:04 brentp

that assertion error reproducible with:

var T = randomTensor(2504, 17384, 0.5'f32)
var sel = randomTensor(T.shape[1], 1'f32).asType(bool)
sel[100..200] = false
T = T[_, sel]

brentp avatar Apr 29 '20 02:04 brentp

It seems like the issue is with reassign a tensor to itself, this doesn't trigger the assertion:

var T = randomTensor(2504, 17384, 0.5'f32)
var sel = randomTensor(T.shape[1], 1'f32).asType(bool)
sel[100..200] = false
let U = T[_, sel]

It might even solve your original bug, I'm not sure how to prevent that though.

mratsim avatar May 20 '20 22:05 mratsim