wasi-sdk icon indicating copy to clipboard operation
wasi-sdk copied to clipboard

SDK should support Undefined Behavior Sanitizer

Open whitequark opened this issue 5 months ago • 4 comments

Per https://github.com/llvm/llvm-project/issues/151015#issuecomment-3130203774.

Currently it results in a linker error:

$ /opt/wasi-sdk/bin/clang -fsanitize=undefined test.c
wasm-ld: error: /tmp/test-1cf416.o: undefined symbol: __ubsan_handle_pointer_overflow
wasm-ld: error: /tmp/test-1cf416.o: undefined symbol: __ubsan_handle_type_mismatch_v1
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This is because sanitizer runtimes are turned off:

https://github.com/WebAssembly/wasi-sdk/blob/d05b57d2a21229fbc1d492d204057cf4751de697/cmake/wasi-sdk-sysroot.cmake#L82

I tried turning them on and that causes a cascade of failures, not the least a bunch of fuzzer headers appearing in include-all.c; some of which don't compile and some introduce new symbols. How should I proceed? I could add them to the find command with exclusions but there's like 15 new headers.

whitequark avatar Jul 30 '25 09:07 whitequark

I don't personally know much about building sanitizers and have always assumed that it requires platform-level intrinsics or OS-level integration, both of which are unlikely pre-ported to wasm. I glanced at ASAN in the past and saw lots of ifdefs for emscripten and assumed it would take a significant effort to work outside of the context of JS. I'm not sure how applicable it all is to UBSAN though.

In the end I'm at least personally happy to add whatever build system hacks we need here to get things building and working. I agree having sanitizers would be quite valuable. My guess though is that a chunk of changes are going to be needed in the upstream sources too so this project likely wouldn't exclusively have all changes.

alexcrichton avatar Jul 30 '25 14:07 alexcrichton

I don't think this is possible to make work in a fully functional manner without some serious re-engineering of how sanitizers function. Consider the handler from the "minimal" UBSan runtime:

#  define GET_CALLER_PC()                              \
    ((__sanitizer::uptr)__builtin_extract_return_addr( \
        __builtin_return_address(0)))

#define HANDLER_RECOVER(name, kind)                                            \
  INTERFACE void __ubsan_handle_##name##_minimal() {                           \
    __ubsan_report_error(kind, GET_CALLER_PC());                               \
  }

In order to tell you where the failure is, it uses __builtin_return_address(0). But this builtin doesn't exist on WASI!

SDValue WebAssemblyTargetLowering::LowerRETURNADDR(SDValue Op,
                                                   SelectionDAG &DAG) const {
  SDLoc DL(Op);

  if (!Subtarget->getTargetTriple().isOSEmscripten()) {
    fail(DL, DAG,
         "Non-Emscripten WebAssembly hasn't implemented "
         "__builtin_return_address");
    return SDValue();
  }

  if (verifyReturnAddressArgumentIsConstant(Op, DAG))
    return SDValue();

  unsigned Depth = Op.getConstantOperandVal(0);
  MakeLibCallOptions CallOptions;
  return makeLibCall(DAG, RTLIB::RETURN_ADDRESS, Op.getValueType(),
                     {DAG.getConstant(Depth, DL, MVT::i32)}, CallOptions, DL)
      .first;
}

On Emscripten, it does a libcall to emscripten_return_address, which is a custom hostcall that asks the JS engine to capture a backtrace, and is defined to return NULL if running standalone, which causes the report to be skipped.

I did manage to get compiler-rt to build a minimal sanitizer runtime, and this is the best UX it can provide:

$ ${WASI_SDK}/clang -fno-sanitize-recover=all -fsanitize-minimal-runtime -fsanitize=undefined  test.c
$ wasmtime run a.out
Error: failed to run main module `a.out`

Caused by:
0: failed to invoke command default
1: error while executing at wasm backtrace:
0:    0x718 - a.out!abort
1:    0x5de - a.out!__ubsan_handle_add_overflow_minimal_abort
2:    0x641 - a.out!long long
3:    0x66e - a.out!__original_main
4:     0xf1 - a.out!_start
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
2: wasm trap: wasm `unreachable` instruction executed

If you enable debug information you get a reasonably accurate source location:

$ WASMTIME_BACKTRACE_DETAILS=1 wasmtime run a.out
Error: failed to run main module `a.out`

Caused by:
0: failed to invoke command default
1: error while executing at wasm backtrace:
0:    0x718 - abort
at wasisdk://v27.0/build/sysroot/wasi-libc-wasm32-wasi/libc-bottom-half/sources/abort.c:5:5
1:    0x5de - abort_with_message
at /home/whitequark/Projects/wasi-sdk/src/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:98:70
- __ubsan_handle_add_overflow_minimal_abort
at /home/whitequark/Projects/wasi-sdk/src/llvm-project/compiler-rt/lib/ubsan_minimal/ubsan_minimal_handlers.cpp:134:1
2:    0x641 - long long
at /home/whitequark/test.c:5:10
3:    0x66e - main
at /home/whitequark/test.c:12:3
4:     0xf1 - _start
at wasisdk://v27.0/build/sysroot/wasi-libc-wasm32-wasi/libc-bottom-half/crt/crt1-command.c:43:13
2: wasm trap: wasm `unreachable` instruction executed

There's no diagnostic and you have to get the cause of failure out of the backtrace. This isn't great, but given the location is accurate and the cause is at least available, it could be used in practice.


I also tried getting the full UBSan runtime to build, but this involves a very large amount of work on the interceptor mechanism; I would expect at least a month of full-time work to get it all upstreamed. I have no appetite for it.

whitequark avatar Jul 30 '25 15:07 whitequark

I'm not sure how applicable it all is to UBSAN though.

I had assumed at first that UBSan's job is quite a bit easier (after all, it's "just" printing some stuff in response to compiler-inserted branches, right?) but now I realize that I underestimated it greatly and the full UBSan does a lot more. The minimal UBSan is more or less that, but even that barely functions.

whitequark avatar Jul 30 '25 15:07 whitequark

i have used a ported version of https://cvsweb.netbsd.org/bsdweb.cgi/src/common/lib/libc/misc/ubsan.c?rev=1.12, which at least report SourceLocation etc.

yamt avatar Sep 01 '25 05:09 yamt