rustler icon indicating copy to clipboard operation
rustler copied to clipboard

Crashing BEAM with SIGBUS without use of unsafe code

Open elbow-jason opened this issue 5 years ago • 7 comments

I implemented Decoder on a type that crashes the VM with a SIGBUS.

I made a minimal reproducible example in this repo: https://github.com/elbow-jason/why_sig_bus

elbow-jason avatar Dec 30 '19 09:12 elbow-jason

After studying this for a little bit. I think I've found the crux of the problem.

Inside the decode function of Args there is:

if let Ok(Args::One(arg)) = term.decode() {
    return Ok(Args::One(arg));
}

Which is self-referential; It uses a decoded Args::One to decode a term into an Args.

Changing it to:

if let Ok((arg,)) = term.decode() {
    return Ok(Args::One(arg));
}

Does not cause a SIGBUS.

elbow-jason avatar Dec 30 '19 09:12 elbow-jason

The question now is: How do we keep others from having this issue?

elbow-jason avatar Dec 30 '19 09:12 elbow-jason

@elbow-jason Thanks for finding this! Some additional info when running this with a debug build of ERTS and valgrind (rustc 1.39.0 (4560ea788 2019-11-04), Elixir 1.9.4, Erlang/OTP 22 [erts-10.6]) :

/tmp/why_sig_bus(master ✔) "$ERL_TOP/bin/cerl" -valgrind --track-origins=yes \
  -pa $(find /usr/lib -name "ebin" -type d | grep elixir) \
  -elixir ansi_enabled true -noshell -s elixir start_cli \
  -- -extra /usr/bin/mix run -e "WhySigBus.decode_args(1)"
==109032== Memcheck, a memory error detector
==109032== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==109032== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==109032== Command: /home/mo/tools/erlang/otp/bin/x86_64-unknown-linux-gnu/beam.valgrind.smp -S 4:4 -SDcpu 4:4 -- -root /home/mo/tools/erlang/otp -progname /home/mo/tools/erlang/otp/bin/cerl\ -valgrind -- -home /home/mo -- -kernel shell_history enabled -- --track-origins=yes -pa /usr/lib/elixir/lib/ex_unit/ebin /usr/lib/elixir/lib/elixir/ebin /usr/lib/elixir/lib/iex/ebin /usr/lib/elixir/lib/eex/ebin /usr/lib/elixir/lib/logger/ebin /usr/lib/elixir/lib/mix/ebin -elixir ansi_enabled true -noshell -s elixir start_cli -- -- -extra /usr/bin/mix run -e WhySigBus.decode_args(1)
==109032==
==109032== Warning: set address range perms: large range [0x5058000, 0x45058000) (noaccess)
Compiling NIF crate :whysigbus_native (native/whysigbus_native)...
    Finished release [optimized] target(s) in 0.01s
==109032==
==109032== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==109032==  Bad permissions for mapped region at address 0x46D9AFF8
==109032==    at 0x49D26300: rustler::types::tuple::<impl rustler::types::Decoder for ()>::decode (in /tmp/why_sig_bus/priv/native/libwhysigbus_native.so)
==109032==
==109032== Process terminating with default action of signal 11 (SIGSEGV)
==109032==  Bad permissions for mapped region at address 0x46D9AFF0
==109032==    at 0x482E120: _vgnU_freeres (vg_preloaded.c:59)
==109032==
==109032== HEAP SUMMARY:
==109032==     in use at exit: 21,582,795 bytes in 29,565 blocks
==109032==   total heap usage: 212,546 allocs, 182,981 frees, 166,718,244 bytes allocated
==109032==
==109032== LEAK SUMMARY:
==109032==    definitely lost: 0 bytes in 0 blocks
==109032==    indirectly lost: 0 bytes in 0 blocks
==109032==      possibly lost: 875,424 bytes in 4,194 blocks
==109032==    still reachable: 20,560,426 bytes in 25,342 blocks
==109032==         suppressed: 146,945 bytes in 29 blocks
==109032== Rerun with --leak-check=full to see details of leaked memory
==109032==
==109032== For lists of detected and suppressed errors, rerun with: -s
==109032== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[1]    109032 segmentation fault (core dumped)  "$ERL_TOP/bin/cerl" -valgrind --track-origins=yes -pa  -elixir ansi_enabled

evnu avatar Jan 06 '20 09:01 evnu

GDB for the coredump indeed indicates an endless loop:

gdb) bt
#0  0x00007f69fbf454ab in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#1  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#2  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#3  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#4  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#5  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#6  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#7  0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so

...

#9388 0x00007f69fbf4551f in <whysigbus_native::Args as rustler::types::Decoder>::decode () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9389 0x00007f69fbf463c2 in std::panicking::try::do_call () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9390 0x00007f69fbf8437a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:80
#9391 0x00007f69fbf45a30 in <whysigbus_native::decode_args as rustler::nif::Nif>::RAW_FUNC::nif_func () from /tmp/why_sig_bus/_build/dev/lib/why_sig_bus/priv/native/libwhysigbus_native.so
#9392 0x0000557cd69bae34 in process_main ()
#9393 0x0000557cd69c66ec in ?? ()
#9394 0x0000557cd6c31bd0 in ?? ()
#9395 0x00007f6a43a604cf in start_thread () from /usr/lib/libpthread.so.0
#9396 0x00007f6a4398f2d3 in clone () from /usr/lib/libc.so.6

evnu avatar Jan 06 '20 09:01 evnu

For simple cases, Rust will warn on endless recursion. Apparently, it cannot detect the recursion in this case here. See this playground example for a simple case which is detectable.

EDIT: When running the playground example, the overflow is detected properly:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

I assume that we do not see such a message as the NIF is not actually handling setting up the stack itself. See here for a discussion regarding this. The calling environment is responsible to set up the running thread and its stack.

evnu avatar Jan 06 '20 10:01 evnu

With some digging, I found the place where ERTS sets a guard page for spawned threads (ethread.c):

#ifdef ETHR_STACK_GUARD_SIZE
    (void) pthread_attr_setguardsize(&attr, ETHR_STACK_GUARD_SIZE);
#endif

In your example, we should run into this guard. I believe that we cannot do anything further, as we cannot know at compile time if a recursive call is finite. But it would probably be helpful to at least have some indication of this in the README. Maybe a "Pitfalls" section would be good?

evnu avatar Jan 06 '20 12:01 evnu

This could fit well as an example of safety caveats at https://rustler-web.onrender.com/docs.

hansihe avatar Sep 08 '21 13:09 hansihe