wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

wasm_runtime_detect_native_stack_overflow works incorrectly under ASAN

Open vchigrin opened this issue 3 months ago • 4 comments

wasm_runtime_detect_native_stack_overflow function compares address of local variable with stack boundary here https://github.com/bytecodealliance/wasm-micro-runtime/blob/a6a9f1f45d9f7ebf044ceac71bcf7a9ea2f90f23/core/iwasm/common/wasm_runtime_common.c#L7898

But under ASAN local variables placed on "Fake stack" https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#algorithm , so this comparison often produces wrong results, reporting "native stack overflow".

I have no minimal reproducible example for WAMR itself, but here is how that API works under Linux: Test file

#define _GNU_SOURCE
#include <pthread.h>
#include <stdio.h>

int main(int argc, char* argv[]) {
  int dummy;
  pthread_t self;
  pthread_attr_t attr;
  size_t stack_size;
  void* addr;

  self = pthread_self();
  if (pthread_getattr_np(self, &attr) != 0) {
    printf("Failed get attr\n");
    return 1;
  }
  pthread_attr_getstack(&attr, &addr, &stack_size);
  printf("Stack %p size %zX; Current frame %p\n", addr, stack_size, &dummy);
  pthread_attr_destroy(&attr);
  return 0;
}

Normal run

clang-18 test.c && ./a.out
Stack 0x7ffcc529e000 size 7FF000; Current frame 0x7ffcc5a9ce4c

You see, 0x7ffcc529e000 is less then 0x7ffcc5a9ce4c, so everything works as expected.

ASAN run

clang-18 -fsanitize=address test.c && ./a.out
Stack 0x7fff01009000 size 7FE000; Current frame 0x7fa947d00020

0x7fff01009000 is bigger then 0x7fa947d00020, so check based on this assumption may wrongly decide that "stack already overflown"

Your environment

  • Linux

I suggest disabling all logic in wasm_runtime_detect_native_stack_overflow if condition #if __has_feature(address_sanitizer) is true.

If this is OK, I can provide patch.

vchigrin avatar Sep 19 '25 17:09 vchigrin

That's strange, we do have ASAN CI enabled and don't see any issue yet. Would you mind trying this sample: native-stack-overflow to see whether it will error?

PS: I think if we do need to modify the code, we should use __SANITIZE_ADDRESS__ to disable the logic; __has_feature(address_sanitizer) is only for clang if I remember correctly.

TianlongLiang avatar Sep 22 '25 03:09 TianlongLiang

Tested native-stack-overflow on my machine on commit a6a9f1f45d9f7ebf044ceac71bcf7a9ea2f90f23. Without sanitizers it does not work

./run.sh
====== Interpreter test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
unhandled SIGSEGV, si_addr: (nil)
Aborted (core dumped)

With ASAN it produces following output:

./run.sh
====== Interpreter test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -    16 | failed |     ok | Exception: native stack overflow
   16 - 24576 | failed |     ok | Exception: invalid exec env

====== Interpreter WAMR_DISABLE_HW_BOUND_CHECK=1 test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 24576 | failed |     ok | Exception: native stack overflow

====== AOT test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -    16 | failed |     ok | Exception: native stack overflow
   16 - 24576 | failed |     ok | Exception: invalid exec env

====== AOT w/ signature test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 -    16 | failed |     ok | Exception: native stack overflow
   16 - 24576 | failed |     ok | Exception: invalid exec env

====== AOT WAMR_DISABLE_HW_BOUND_CHECK=1 test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 24576 | failed |     ok | Exception: native stack overflow

====== AOT w/ signature WAMR_DISABLE_HW_BOUND_CHECK=1 test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
    0 - 24576 | failed |     ok | Exception: native stack overflow

I checked what happens under GDB for first test case - and seems it fails on the very first stack check. Here is stack:

(gdb) bt
#0  0x0000555555574688 in wasm_set_exception_local (exception=0x5555555fe9c0 "native stack overflow", module_inst=0x516000000080)
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/common/wasm_runtime_common.c:3076
#1  wasm_set_exception (module_inst=0x516000000080, exception=exception@entry=0x5555555fe9c0 "native stack overflow")
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/common/wasm_runtime_common.c:3099
#2  0x0000555555574a3d in wasm_runtime_set_exception (module_inst_comm=<optimized out>,
    exception=exception@entry=0x5555555fe9c0 "native stack overflow")
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/common/wasm_runtime_common.c:3192
#3  0x000055555557a0e7 in wasm_runtime_detect_native_stack_overflow (exec_env=exec_env@entry=0x524000002100)
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/common/wasm_runtime_common.c:7899
#4  0x000055555557a80f in call_wasm_with_hw_bound_check (module_inst=module_inst@entry=0x516000000080,
    exec_env=exec_env@entry=0x524000002100, function=function@entry=0x513000000130, argc=argc@entry=2,
    argv=argv@entry=0x7ffff5700030) at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/interpreter/wasm_runtime.c:3609
#5  0x000055555557bd39 in wasm_call_function (exec_env=exec_env@entry=0x524000002100, function=function@entry=0x513000000130,
    argc=argc@entry=2, argv=argv@entry=0x7ffff5700030)
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/interpreter/wasm_runtime.c:3689
#6  0x00005555555745f1 in wasm_runtime_call_wasm (exec_env=exec_env@entry=0x524000002100, function=0x513000000130,
    argc=argc@entry=2, argv=argv@entry=0x7ffff5700030)
    at /home/vchigrin/projects/wasm-micro-runtime/core/iwasm/common/wasm_runtime_common.c:2666
#7  0x00005555555702cf in main (argc=<optimized out>, argv=<optimized out>)
    at /home/vchigrin/projects/wasm-micro-runtime/samples/native-stack-overflow/src/main.c:161

vchigrin avatar Sep 22 '25 11:09 vchigrin

Several observations:

  • The native-stack-overflow samples work well with and without -DWAMR_BUILD_SANITIZER=asan on my local Ubuntu 22.04 LTS environment.
  • AddressSanitizerUseAfterReturn is off by default. The "fake stack" should not be involved unless another configuration enables it.
  • Both GCC and Clang have some kind of ignore list feature, IIUC For example, see https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-finstrument-functions-exclude-function-list. It might be better to try them first.

lum1n0us avatar Sep 24 '25 01:09 lum1n0us

Tested native-stack-overflow on my machine on commit a6a9f1f45d9f7ebf044ceac71bcf7a9ea2f90f23. Without sanitizers it does not work

./run.sh
====== Interpreter test1
 stack size   | fail?  | leak?  | exception
---------------------------------------------------------------------------
unhandled SIGSEGV, si_addr: (nil)
Aborted (core dumped)

can you file a separate bug with a bit more details about this? i couldn't reproduce it. (macOS, x86-64, clang)

yamt avatar Sep 25 '25 06:09 yamt