rustler icon indicating copy to clipboard operation
rustler copied to clipboard

SIGSEGV in beam after a pause

Open rickpayne opened this issue 4 years ago • 3 comments

I've been converting your rustler_tests into an erlang version. I've noticed that I get a SIGSEGV if I do some tests in the shell, pause a while and then try again.

The code is at https://github.com/rickpayne/rustler_test. The rust code is basically from your rustler_tests directory.

Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

Eshell V10.6.4  (abort with ^G)
(rustler_test@clifford)1> rustler_test:atom_
atom_equals_ok/1  atom_str_error/0  atom_to_string/1  
(rustler_test@clifford)1> rustler_test:atom_equals_ok(wibble).
false
(rustler_test@clifford)2> rustler_test:atom_equals_ok(wibble).
false
(rustler_test@clifford)3> rustler_test:atom_equals_ok(wibble).
false
(rustler_test@clifford)4> rustler_test:echo_u8(256).          
0
(rustler_test@clifford)5> rustler_test:echo_u8(257).
1
(rustler_test@clifford)6> rustler_test:echo_u8(100).
100
(rustler_test@clifford)7> rustler_test:echo_u8(wibble).
** exception error: bad argument
     in function  rustler_test:echo_u8/1
        called as rustler_test:echo_u8(wibble)
(rustler_test@clifford)8> rustler_test:echo_u8(wibble).
<crash>

Note between line 6 and line 7 was a couple of minutes. The backtrace looks like this:

Thread 12 "8_scheduler" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fcf3e1b6700 (LWP 5776)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fcec78814ed in <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter ()
   from /home/rickp/src/rust/rustler_test/_build/default/rel/rustler_test/lib/rustler_test-0.1.0/priv/librustler_test.so
#2  0x00007fcec788a6b9 in <rustler_test::test_primitives::echo_u8 as rustler::nif::Nif>::RAW_FUNC::nif_func ()
   from /home/rickp/src/rust/rustler_test/_build/default/rel/rustler_test/lib/rustler_test-0.1.0/priv/librustler_test.so
#3  0x0000560bdbd87235 in process_main (x_reg_array=0x10, f_reg_array=0x8)
    at x86_64-unknown-linux-gnu/opt/smp/beam_cold.h:119
#4  0x0000560bdbd8dc6b in sched_thread_func (vesdp=0x7fcf3fb0c700) at beam/erl_process.c:8498
#5  0x0000560bdbfc893e in thr_wrapper (vtwd=0x7ffcefb81110) at pthread/ethread.c:118
#6  0x00007fcf8289b669 in start_thread (arg=<optimised out>) at pthread_create.c:479
#7  0x00007fcf827c3323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This is happening quite a lot, so if there is more debug I can provide, let me know...

rickpayne avatar May 01 '20 05:05 rickpayne

I can't reproduce this, could you try to make an escript with some sleeps in it (receive after 10000 -> ok end) to reliably reproduce it on your system?

filmor avatar May 01 '20 15:05 filmor

Also, could you print out nif_env, argc and argv from frame #2?

hansihe avatar May 01 '20 15:05 hansihe

Maybe it only happens if I've had an error (badarg) generated previously? I build rustler_test, then start the app (_build/default/rel/rustler_test/bin/rustler_test console) and try things in there whilst I'm writing the eunit tests.

Iwill try and get an escript that reproduces it. Another crash this morning:

Thread 11 "7_scheduler" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f38fb3f9700 (LWP 1799)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f38f8ae8f15 in <rustler_test::test_primitives::result_to_int as rustler::nif::Nif>::RAW_FUNC::nif_func (nif_env=0x7f38fb3f8d80, argc=1, argv=0x7f38fe6585c0) at native/src/test_primitives.rs:21
#2  0x000055c2e307d235 in process_main (x_reg_array=0x7f38fb3f8c48, f_reg_array=0x7f38fb3f8d80)
    at x86_64-unknown-linux-gnu/opt/smp/beam_cold.h:119
#3  0x000055c2e3083c6b in sched_thread_func (vesdp=0x7f38fca42380) at beam/erl_process.c:8498
#4  0x000055c2e32be93e in thr_wrapper (vtwd=0x7ffcd2da8ca0) at pthread/ethread.c:118
#5  0x00007f393f7e0669 in start_thread (arg=<optimised out>) at pthread_create.c:479
#6  0x00007f393f708323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) up
#1  0x00007f38f8ae8f15 in <rustler_test::test_primitives::result_to_int as rustler::nif::Nif>::RAW_FUNC::nif_func (nif_env=0x7f38fb3f8d80, argc=1, argv=0x7f38fe6585c0) at native/src/test_primitives.rs:21
21	#[rustler::nif]
(gdb) 
#2  0x000055c2e307d235 in process_main (x_reg_array=0x7f38fb3f8c48, f_reg_array=0x7f38fb3f8d80)
    at x86_64-unknown-linux-gnu/opt/smp/beam_cold.h:119
119	     nif_bif_result = (*fp)(&env, bif_nif_arity, reg);
(gdb) p env
$1 = {mod_nif = 0x7f38fa9885e8, proc = 0x7f38fd9407a8, hp = 0x7f38fa12a800, hp_end = 0x7f38fa12a858, 
  heap_frag = 0x0, fpe_was_unmasked = 0, tmp_obj_list = 0x0, exception_thrown = 0, tracee = 0x0, exiting = 0}
(gdb) p bif_nif_arity
$2 = 1
(gdb) p reg
$3 = (Eterm *) 0x7f38fe6585c0
(gdb) p *reg
$4 = 139882690422666
(gdb) p fp
$5 = (NifF *) 0x7f38f8ae8ef0 <<rustler_test::test_primitives::result_to_int as rustler::nif::Nif>::RAW_FUNC::nif_func>
(gdb) 

rickpayne avatar May 01 '20 21:05 rickpayne