WebAssembly and Exception Handling (throw)
Hey all,
After getting clang-repl running in the browser, I worked on integration it with jupyterlite. Xeus-cpp, a C++ Jupyter kernel provides a way to integrate it. Here is a static link that can be used to try C++ completely in the browser (https://compiler-research.org/xeus-cpp/lab/index.html) . An example notebook xeus-cpp-lite-demo.ipynb has been provided to show what all can be acheived.
Coming back to the issue. I see we can't run throw (or a try catch blocking involving throw) while running clang-repl in the browser.
The debug logs tell me that this comes from dlopen
All this can be tried through the static link above.
Now the point is that for running clang-repl in the browser. This is the workflow taken
code -> PTU -> LLVM IR -> wasm object -> wasm binary -> loaded on top of main module using dlopen
- So as it fails in the dlopen step, we know for sure that the LLVM IR is being produced and also a wasm binary is being produced (hopefully correctly)
Pasting them down below just for reference
i) LLVM IR (only relevant part)
@_ZTIi = external constant ptr
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_2, ptr null }]
define internal void @__stmts__0() #0 {
%1 = call ptr @__cxa_allocate_exception(i32 4) #2
store i32 1, ptr %1, align 16
call void @__cxa_throw(ptr %1, ptr @_ZTIi, ptr null) #3
unreachable
}
declare ptr @__cxa_allocate_exception(i32) #0
declare void @__cxa_throw(ptr, ptr, ptr) #0
; Function Attrs: noinline
define internal void @_GLOBAL__sub_I_incr_module_2() #1 {
call void @__stmts__0()
ret void
}
ii) wasm module
(module $incr_module_2.wasm
(memory $env.memory (;0;) (import "env" "memory") 0)
(table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
(global $__memory_base (;0;) (import "env" "__memory_base") i32)
(global $__table_base (;1;) (import "env" "__table_base") i32)
(func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
(func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
(global $typeinfo for int (;2;) (import "GOT.mem" "_ZTIi") (mut i32))
(func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
call $_GLOBAL__sub_I_incr_module_2
)
(func $__wasm_apply_data_relocs (;3;) (export "__wasm_apply_data_relocs")
)
(func $__stmts__0 (;4;)
(local $var0 i32)
i32.const 4
call $__cxa_allocate_exception
local.tee $var0
i32.const 1
i32.store
local.get $var0
global.get $typeinfo for int
i32.const 0
call $__cxa_throw
unreachable
)
(func $_GLOBAL__sub_I_incr_module_2 (;5;)
call $__stmts__0
)
)
I think this looks correct to me !
Now coming back to the dloepn step. The debugger through chrome tools tells me that this is the last part where it ends up
https://github.com/emscripten-core/emscripten/blob/4c14f1f34adfcc06fca235452c5d47ddf612c1f2/src/library_dylink.js#L856-L859
Which means it is trying to execute this block I'd guess
(func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
call $_GLOBAL__sub_I_incr_module_2
)
(func $__stmts__0 (;4;)
(local $var0 i32)
i32.const 4
call $__cxa_allocate_exception
local.tee $var0
i32.const 1
i32.store
local.get $var0
global.get $typeinfo for int
i32.const 0
call $__cxa_throw
unreachable
)
(func $_GLOBAL__sub_I_incr_module_2 (;5;)
call $__stmts__0
)
But it isn't able to. Now __wasm_call_ctors calls _GLOBAL__sub_I_incr_module_5 which simply calls __stmts__0 ... So I am guessing its just not able to run __stmts__0 but I think even that is being framed correctly ?
cc @sbc100 @kripken
Here's what I thought might be going wrong.
- Just as a sanity check I thought that I should confirm the presence of symbols in the final xcpp.wasm be built (the wasm binary out of xeus-cpp that acts as a main module)
(xeus-lite-host) anutosh491@Anutoshs-MacBook-Air build % wasm-objdump -x xcpp.wasm | grep __cxa
- func[1] sig=10 <__cxa_find_matching_catch_2> <- env.__cxa_find_matching_catch_2
.....
- func[9678] <__cxa_allocate_exception>
.....
- global[3] <__cxa_throw>
(xeus-lite-host) anutosh491@Anutoshs-MacBook-Air build % wasm-objdump -x xcpp.wasm | grep _ZTIi
- global[1701] i32 mutable=0 <_ZTIi> - init i32=409276
- global[1701] -> "_ZTIi"
I think we have everything
- I thought this might be a
-fwasm-exceptionsor-fexceptionsthingy. I realized we build xeus-cpp with-fexceptionsbut llvm isn't using that (we obviously need to build llvm for wasm to get libclangInterpreter.a which facilitates using clang-repl in the web). So I tried this too but didn't help me in any way. Still get the same result.
If y'all are interested in the configuration, this is what i used.
emcmake cmake -DCMAKE_BUILD_TYPE=MinSizeRel \
-DBUILD_SHARED_LIBS=OFF \
-DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten \
-DLLVM_TARGETS_TO_BUILD="WebAssembly" \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_ENABLE_LIBEDIT=OFF \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_THREADS=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_BOOTSTRAP=OFF \
-DLLVM_ENABLE_ZSTD=OFF \
-DLLVM_ENABLE_LIBXML2=OFF \
-DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -fexceptions" \
../llvm
Apart from adding the -fexceptions flag here ... everything is what we already use for getting the static link to work !
Is there still an issue here?
You are correct that you need to make sure that -fwasm-exceptions is either used everywhere, or nowhere. You cannot mix code compiled with and without that flag. Did making that consistent fix your issue?
From your original backtrace looks like the code its trying to load a DLL called "const char*", which is very odd. Can you stop in the debugger and see why that might be? Is the name of file being loaded really "const char*"?
(BTW, you file bugs like this it would be very helpful if you could copy and paste the text rather than attaching screenshots. Using test make it much easier for use to search / copy / etc within the issue.)
Is there still an issue here?
Yes it is.
Did making that consistent fix your issue?
So I think we tried building the whole stack with -fexceptions (jupyterlite, xeus-cpp, llvm etc) and we haven't moved to -fwasm-exceptions yet as we thought using one of these for the whole toolchain would be enough !
Can you stop in the debugger and see why that might be? Is the name of file being loaded really "const char*"?
Is it ? So when we use clang-repl in the browser ever code block produced a file named incr_module_xx.wasm where xx is the code block number. So yeah don't think that's the file name here !
I think it, it might just be the exception ptr type or something (not sure). The below issue looks relevant here. https://github.com/emscripten-core/emscripten/issues/6330
EDIT: Also just questioning my breakdown here. The wasm module generated looks correct to me and I think it is the init() call that I referred above that doesn't work ! Maybe someone could confirm that for me ?
I think that fact that _dlopen_js is being called with the string "const char*" rather than the name of a DLL is really the clue. That looks really wrong.
Can you break at that callsite and see the string ptr value being passed to _dlopen_js? Presumably the user code passed a completely different string.. can you print the ptr value on C++ side too? It looks like dlopen is being called from side module with function names like $func917. I imagine somehow the DLL is confused about where its static data lives? Perhaps __memory_base was not correctly set when the DLL was loaded?
Can you try building you side modules with --profiling-funcs so you get useful functions names instead of $func917?
Hey @sbc100 sorry took some time to get back
But this the whole log when we try executing "throw 1;"
It points to the addModule function as expected where the dlopen is being called
https://github.com/llvm/llvm-project/blob/main/clang/lib/Interpreter/Wasm.cpp#L65
So this is what I see when I build xeus-cpp with Assertion=0 vs Assertions=1
-
Assertions=0 (looks like some exception ptr)
-
Assertions=1 (type being returned)
Also I don't see absolutely no difference in how dlopen is working for any cell that works vs the cell executing throw 1;
Its the same. The file name is also tmp/incr_module_2.wasm which should be the case. Case 1 is when some default case vs case 2 is with throw
At this point there is so much we can already do (check the example notebook https://github.com/compiler-research/xeus-cpp/blob/main/notebooks/xeus-cpp-lite-demo.ipynb) that not being able to use throw 1; seems very weird.
I executed stuff till this the final failure which comes up here
https://github.com/emscripten-core/emscripten/blob/f9ca632180d2dce786fd87c544c5d99b1d5fb834/src/lib/libdylink.js#L857-L860
As soon as the debugger hits init() I get the error message.
Nothing really seems fishy till the end. I see memory_base is 0 here (not sure if that shouldn't be the case, looks ok to me)
What is memorySize? memoryBase should only be zero if memorySize is also zero.
The memoryBase and tableBase are there the data segment and table segment for your DLL are stored. They will only be zero when you module has no memory segment or no table segment of its own.
You can see how much data and table space your module needs by looking at the dylink section of the module/DLL. Its always the first section in any wasm DLL:
$ ./emcc -sSIDE_MODULE test/hello_world.c
$ wasm-objdump -x a.out.wasm
a.out.wasm: file format wasm 0x1
Section Details:
Custom:
- name: "dylink.0"
- mem_size : 15
- mem_p2align : 0
- table_size : 0
- table_p2align: 0
...
Here you can see the hello world program, when compiled to DLL requires 15 bytes of memory and zero table slots.
This is what metadata has after getDylinkMetadata(binary)
var loadWebAssemblyModule = (binary, flags, libName, localScope, handle) => {
var metadata = getDylinkMetadata(binary);
memoryAlign: 0
memorySize: 0
neededDynlibs: []
tableAlign: 0
tableSize: 0
tlsExports: Set(0) {size: 0}
weakImports: Set(0) {size: 0}
[[Prototype]]: Object
Binary shows Int8Array(809) if that's relevant. So yeah I guess it is the getDylinkMetadata call that doesn't go as expected ?
Also I see you've mentioned about using wasm-objdump but I am not sure how to put it to use at runtime. As in every cell block gives me a side module (code -> PTU -> llvm IR -> incr_module_xx.so file -> incr_module_xx.wasm file -> loaded on top of main module using dlopen)
So the max I can do is go through the incr_module_xx.wasm file which comes out of to be something like this
(module $incr_module_2.wasm
(memory $env.memory (;0;) (import "env" "memory") 0)
(table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
(global $__stack_pointer (;0;) (import "env" "__stack_pointer") (mut i32))
(global $__memory_base (;1;) (import "env" "__memory_base") i32)
(global $__table_base (;2;) (import "env" "__table_base") i32)
(func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
(func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
(global $typeinfo for int (;3;) (import "GOT.mem" "_ZTIi") (mut i32))
(global $__dso_handle (;4;) (export "__dso_handle") i32 (i32.const 0))
(func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
call $_GLOBAL__sub_I_incr_module_2
)
(func $__stmts__0 (;3;)
(local $var0 i32)
i32.const 4
call $__cxa_allocate_exception
local.tee $var0
i32.const 1
i32.store
local.get $var0
global.get $typeinfo for int
i32.const 0
call $__cxa_throw
unreachable
)
(func $_GLOBAL__sub_I_incr_module_2 (;4;)
call $__stmts__0
)
)
P.S: Also just for you to confirm for yourself that the error is coming out of dlopen itself (and also to maybe play around and debug any questions you might have) I think you can try running throw 1; through our static link and add breakpoints in the source files to see what's happening https://compiler-research.org/xeus-cpp/lab/index.html
So incr_module_xx.wasm module looks like it actually doesn't have any data or table slots (I don't see any data segments or elem segments).
BTW you can run wasm-object on your incr_module_xx.wasm file to see the dylink section if you like. It not showing up in the wat disassembly that you attached above.
Can you set a breakpoint on the "Error loading dyanmic library" line and see inspect the exception (e) that is being thrown? What does the stack trace for that exception look like?
So incr_module_xx.wasm module looks like it actually doesn't have any data or table slots (I don't see any data segments or elem segments).
This is what I see for int x = 10;
This seems to have a data segment at the end. So I don't think this is the case for every cell. It's just throw which might be at fault.
Okay so I set the debugger below and print e
try {
return loadDynamicLibrary(filename, combinedFlags, localScope, handle)
} catch (e) {
err(`Error in loading dynamic library ${filename}: ${e}`); //HERE
dlSetError(`Could not load dynamic lib: ${filename}\n${e}`);
return 0
}
I see this
e
int
at ___cxa_throw (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1250810)
at incr_module_3.wasm.__stmts__0 (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[3]:0x139)
at incr_module_3.wasm._GLOBAL__sub_I_incr_module_3 (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[4]:0x143)
at incr_module_3.wasm.__wasm_call_ctors (wasm://wasm/incr_module_3.wasm-35041342:wasm-function[2]:0x119)
at postInstantiation (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1241584)
at loadModule (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1242062)
at loadWebAssemblyModule (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1242363)
at getExports (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1245234)
at loadDynamicLibrary (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1245560)
at dlopenInternal (http://127.0.0.1:8723/xeus/bin/xcpp.js:9:1337847)
I think I have pointed this out before that eventually we end up at init() or wasm_call_ctors() and eventually it gets to
call $__cxa_throw from the whole wasm file (pasted above in https://github.com/emscripten-core/emscripten/issues/23442#issuecomment-2637431038)
Well what does that mean ? I guess the logic is correct as we would have liked it to be ? But then ..... throw is called and hence the dlopen step errors out I am guessing ! Confused as to what needs to be done here. Does this mean we can't load a DLL/module having a call to __cxa_throw using dlopen ?
For some context, xeus-cpp unlike xeus-cpp-lite uses clang-repl locally and not in the browser. So If you check out example notebook of what xeus-cpp can do locally, clang-repl can execute throw
So one of your static constructors is throwing an exception.
Are you actually trying to execute a C++ throw in your notebook? If so, wouldn't you expect the DLL to fail to load? Or is the problem that you want to somehow catch that exception yourself instead of having dlopen fail?
Hey @sbc100
Sorry for missing out on this but yeah the point is I should be able to exactly replicate what clang-repl does locally (or what xeus-cpp is doing here based on clang-repl)
So my point is that I won't expect an error out of dlopen or the module failing to load. Rather the module should be loaded on top of the main module and that should then give back any Error message or whatever we print through the console .
Does this mean the wasm being generated is wrong ? Cause the wasm binary calls _cxa_throw directly I suppose. Don't you think in this case we should be able to go exactly how clang-repl handles this ? I am a bit confused on how to proceed !
Are you saying that in this case the catch is not actually catching the throw 1? i.e. you are seeing the town value escape an not seeing the Error print message?
Okay @sbc100
I think this might be a new/separate problem at hand. But let's look into this
Case 1: We have throw 1; and this fails through dlopen failing to load as init calls __cxa_throw. And maybe there is nothing wrong here. Obviously I wouldn't like the kernel crashing for xeus-cpp-lite but fair enough as an exception was not caught hence now the jupyterlite instance is corrupted.
Case 2: But for a throw catch block
try{
throw 1;
} catch (...) {
0;
}
I still see the same happening
i) Now even in this case we obviously first parse and come up with a LLVM IR. Looking at the LLVM IR generated ..... I do see __cxa_begin_catch being referenced .... but not sure if it put to use
; ModuleID = 'incr_module_3'
source_filename = "incr_module_3"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-i128:128-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"
@_ZTIi = external constant ptr
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_3, ptr null }]
define internal void @__stmts__0() #0 personality ptr @__gxx_personality_v0 {
entry:
%exn.slot = alloca ptr, align 4
%ehselector.slot = alloca i32, align 4
%exception = call ptr @__cxa_allocate_exception(i32 4) #2
store i32 1, ptr %exception, align 16
call void @__cxa_throw(ptr %exception, ptr @_ZTIi, ptr null) #3
br label %unreachable
unreachable: ; preds = %entry
unreachable
}
declare ptr @__cxa_allocate_exception(i32) #0
declare void @__cxa_throw(ptr, ptr, ptr) #0
declare i32 @__gxx_personality_v0(...) #0
declare ptr @__cxa_begin_catch(ptr) #0
declare void @__cxa_end_catch() #0
; Function Attrs: noinline
define internal void @_GLOBAL__sub_I_incr_module_3() #1 {
entry:
call void @__stmts__0()
ret void
}
attributes #0 = { "target-features"="-atomics,+bulk-memory,+bulk-memory-opt,+call-indirect-overlong,-exception-handling,-extended-const,-fp16,-multimemory,+multivalue,+mutable-globals,+nontrapping-fptoint,+reference-types,-relaxed-simd,+sign-ext,-simd128,-tail-call,-wide-arithmetic," }
attributes #1 = { noinline "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-atomics,+bulk-memory,+bulk-memory-opt,+call-indirect-overlong,-exception-handling,-extended-const,-fp16,-multimemory,+multivalue,+mutable-globals,+nontrapping-fptoint,+reference-types,-relaxed-simd,+sign-ext,-simd128,-tail-call,-wide-arithmetic," }
attributes #2 = { nounwind }
attributes #3 = { noreturn }
!llvm.linker.options = !{}
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7, !8}
!llvm.ident = !{!9}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"wasm-feature-bulk-memory", i32 43}
!2 = !{i32 1, !"wasm-feature-bulk-memory-opt", i32 43}
!3 = !{i32 1, !"wasm-feature-call-indirect-overlong", i32 43}
!4 = !{i32 1, !"wasm-feature-multivalue", i32 43}
!5 = !{i32 1, !"wasm-feature-mutable-globals", i32 43}
!6 = !{i32 1, !"wasm-feature-nontrapping-fptoint", i32 43}
!7 = !{i32 1, !"wasm-feature-reference-types", i32 43}
!8 = !{i32 1, !"wasm-feature-sign-ext", i32 43}
ii) After this step we end up generating the wasm module which is obviously wrong
(module $incr_module_3.wasm
(memory $env.memory (;0;) (import "env" "memory") 0)
(table $env.__indirect_function_table (;0;) (import "env" "__indirect_function_table") 0 funcref)
(global $__stack_pointer (;0;) (import "env" "__stack_pointer") (mut i32))
(global $__memory_base (;1;) (import "env" "__memory_base") i32)
(global $__table_base (;2;) (import "env" "__table_base") i32)
(func $__cxa_allocate_exception (;0;) (import "env" "__cxa_allocate_exception") (param i32) (result i32))
(func $__cxa_throw (;1;) (import "env" "__cxa_throw") (param i32 i32 i32))
(global $typeinfo for int (;3;) (import "GOT.mem" "_ZTIi") (mut i32))
(func $__wasm_call_ctors (;2;) (export "__wasm_call_ctors")
call $_GLOBAL__sub_I_incr_module_3
)
(func $__wasm_apply_data_relocs (;3;) (export "__wasm_apply_data_relocs")
)
(func $__stmts__0 (;4;)
(local $var0 i32)
global.get $__stack_pointer
i32.const 16
i32.sub
global.set $__stack_pointer
i32.const 4
call $__cxa_allocate_exception
local.tee $var0
i32.const 1
i32.store
local.get $var0
global.get $typeinfo for int
i32.const 0
call $__cxa_throw
unreachable
)
(func $_GLOBAL__sub_I_incr_module_3 (;5;)
call $__stmts__0
)
)
My understanding related to a catch block is that we definitely should end up seeing a landingpad
Now my point is how is llvm IR generated through clang and clang-repl turning out to be different ?
For example if I put this in test.cpp and run the following
int main() {
try {
throw 1;
} catch (...) {
0;
}
}
-
emcc test.cpp -std=c++20 -fexceptions -emit-llvm -S -o test.ll..... I see this
; ModuleID = 'test.cpp'
source_filename = "test.cpp"
target datalayout = "e-m:e-p:32:32-p10:8:8-p20:8:8-i64:64-f128:64-n32:64-S128-ni:1:10:20"
target triple = "wasm32-unknown-emscripten"
@_ZTIi = external constant ptr
@__main_void = hidden alias i32 (), ptr @main
; Function Attrs: mustprogress noinline norecurse optnone
define hidden noundef i32 @main() #0 personality ptr @__gxx_personality_v0 {
%1 = alloca ptr, align 4
%2 = alloca i32, align 4
%3 = call ptr @__cxa_allocate_exception(i32 4) #1
store i32 1, ptr %3, align 16
invoke void @__cxa_throw(ptr %3, ptr @_ZTIi, ptr null) #2
to label %12 unwind label %4
4: ; preds = %0
%5 = landingpad { ptr, i32 }
catch ptr null
%6 = extractvalue { ptr, i32 } %5, 0
store ptr %6, ptr %1, align 4
%7 = extractvalue { ptr, i32 } %5, 1
store i32 %7, ptr %2, align 4
br label %8
8: ; preds = %4
%9 = load ptr, ptr %1, align 4
%10 = call ptr @__cxa_begin_catch(ptr %9) #1
call void @__cxa_end_catch()
br label %11
11: ; preds = %8
ret i32 0
12: ; preds = %0
unreachable
}
declare ptr @__cxa_allocate_exception(i32)
declare void @__cxa_throw(ptr, ptr, ptr)
declare i32 @__gxx_personality_v0(...)
declare ptr @__cxa_begin_catch(ptr)
declare void @__cxa_end_catch()
attributes #0 = { mustprogress noinline norecurse optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+multivalue,+mutable-globals,+reference-types,+sign-ext,-bulk-memory,-nontrapping-fptoint" }
attributes #1 = { nounwind }
attributes #2 = { noreturn }
!llvm.linker.options = !{}
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 20.0.0git (https:/github.com/llvm/llvm-project 1d810ece2b2c8fab77720493864257f0ea3336a9)"}
- Followed by this we can generate a SIDE_MODULE as required.
emcc test.cpp -std=c++20 -sSIDE_MODULE=1 -fexceptions -sDISABLE_EXCEPTION_CATCHING=0 -o test.wasm
I see most of the important stuff being put to use
(import "env" "__cxa_allocate_exception" (func (;0;) (type 0)))
(import "env" "invoke_viii" (func (;1;) (type 3)))
(import "env" "__cxa_find_matching_catch_3" (func (;2;) (type 0)))
(import "env" "getTempRet0" (func (;3;) (type 1)))
(import "env" "__cxa_begin_catch" (func (;4;) (type 0)))
(import "env" "__cxa_end_catch" (func (;5;) (type 2)))
(import "env" "__stack_pointer" (global (;0;) (mut i32)))
(import "env" "__memory_base" (global (;1;) i32))
(import "env" "__table_base" (global (;2;) i32))
(import "GOT.mem" "__THREW__" (global (;3;) (mut i32)))
(import "GOT.mem" "_ZTIi" (global (;4;) (mut i32)))
(import "GOT.func" "__cxa_throw" (global (;5;) (mut i32)))
(import "env" "memory" (memory (;0;) 0))
(import "env" "__indirect_function_table" (table (;0;) 0 funcref))
This is weird as running clang-repl in the browser or locally, the PTU generation step is the same, it's only the execution step that differs
https://github.com/llvm/llvm-project/blob/66465c3b0ab1b32403ad5a1c3114174d87830f54/clang/lib/Interpreter/Interpreter.cpp#L646-L650
So technically we shouldn't be seeing a wrong LLVM IR leading to a wrong wasm module !
EDIT: I have a question.
This is weird as running clang-repl in the browser or locally, the PTU generation step is the same, it's only the execution step that differs
Is this possibly dependent on how we build LLVM (maybe with some sort of exceptions enabled or disabled). This is what I use to build llvm for wasm currently.
mkdir build
cd build
export CMAKE_PREFIX_PATH=$PREFIX
export CMAKE_SYSTEM_PREFIX_PATH=$PREFIX
# clear LDFLAGS flags because they contain sWASM_BIGINT
export LDFLAGS=""
# Configure step
emcmake cmake ${CMAKE_ARGS} -S ../llvm -B . \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DCMAKE_PREFIX_PATH=$PREFIX \
-DCMAKE_INSTALL_PREFIX=$PREFIX \
-DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten \
-DLLVM_TARGETS_TO_BUILD="WebAssembly" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_ENABLE_LIBEDIT=OFF \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_THREADS=OFF \
-DLLVM_ENABLE_ZSTD=OFF \
-DLLVM_ENABLE_LIBXML2=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_BOOTSTRAP=OFF \
-DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4"
# Build step
emmake make -j4
# Install step
emmake make install
# Copy all files with ".wasm" extension to $PREFIX/bin
cp $SRC_DIR/build/bin/*.wasm $PREFIX/bin
All of this is present as a part of the recipe for llvm on emscripten-forge (https://github.com/emscripten-forge/recipes/blob/main/recipes/recipes_emscripten/llvm/build.sh)
Hey @sbc100
Not sure you saw my ping above, hence tagging you to maybe help me out with the last 2-3 messages continuing our discussion after https://github.com/emscripten-core/emscripten/issues/23442#issuecomment-2657593011
What are the build flags you are using then building the side module in clang-repl? They must be somehow different from those used in emscripten. I'm guessing you are missing -fexceptions or -fwasm-exceptions perhaps?
You can add -v to the emcc command to see all the flags that get passed to clang and wasm-ld, in case that helps.
What are the build flags you are using then building the side module in clang-repl?
So these are the flags used for the side module (each cell gives us one that is loaded on top of the main module)
https://github.com/llvm/llvm-project/blob/85601fd78f4cbf0ce5df74c5926183035f859572/clang/lib/Interpreter/Wasm.cpp#L74-L84
They must be somehow different from those used in emscripten. I'm guessing you are missing -fexceptions or -fwasm-exceptions perhaps?
Wait, so for the latest build. I took care of this (basically just updated the cxx_flags to take care of -fexceptions too so just added it to the emcmake cmake... command here
And I still see this
So the current CXX_FLAGS being put to use are these
CXX_FLAGS = -Dwait4=__syscall_wait4 -fexceptions -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -Os -DNDEBUG -std=c++17 -UNDEBUG
So yeah we know the roadmap here (code -> PTU -> llvm IR -> incr_module_xx.so file -> incr_module_xx.wasm file -> loaded on top of main module using dlopen)
My first concern is clang-repl and clang technically promise making use of the same llvm IR. I don't know why we don't see the correct LLVM IR (even before getting to the shared object in clang-repl) for this case.
What are the build flags you are using then building the side module in clang-repl?
So these are the flags used for the side module (each cell gives us one that is loaded on top of the main module)
https://github.com/llvm/llvm-project/blob/85601fd78f4cbf0ce5df74c5926183035f859572/clang/lib/Interpreter/Wasm.cpp#L74-L84
Those are the link flags. What are the compile-time flags used to build the object file being linked?
Hey @sbc100, yes I think stuff boils down to that
What are the compile-time flags used to build the object file being linked?
But not sure how to get hold of them :\
But everything happens in the addModule code
What I think happens is
- once we have the LLVM IR, we use this framework to create the shared object (which is later moved to a wasm binary using wasm-ld)
const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
PTU.TheModule->getTargetTriple(), ErrorString);
if (!Target) {
return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
llvm::inconvertibleErrorCode());
}
llvm::TargetOptions TO = llvm::TargetOptions();
llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());
std::string ObjectFileName = PTU.TheModule->getName().str() + ".o";
std::string BinaryFileName = PTU.TheModule->getName().str() + ".wasm";
std::error_code Error;
llvm::raw_fd_ostream ObjectFileOutput(llvm::StringRef(ObjectFileName), Error);
llvm::legacy::PassManager PM;
if (TargetMachine->addPassesToEmitFile(PM, ObjectFileOutput, nullptr,
llvm::CodeGenFileType::ObjectFile)) {
return llvm::make_error<llvm::StringError>(
"Wasm backend cannot produce object.", llvm::inconvertibleErrorCode());
}
if (!PM.run(*PTU.TheModule)) {
return llvm::make_error<llvm::StringError>("Failed to emit Wasm object.",
llvm::inconvertibleErrorCode());
}
- My understanding here is that
i) We create a Target (extracted from the target triple from our module)
ii) We create a TargetMachine which I guess uses llc on the llvm IR we have.
iii) Now by default we don't really set any TargetOptions .... but that being said we can configure this to use wasm exceptions using inspired from https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/MCTargetOptions.h#L25
llvm::TargetOptions TO = llvm::TargetOptions();
TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
iv) But that being said, I know of an error like this
LLVM ERROR: -exception-model=wasm only allowed with at least one of -wasm-enable-eh or -wasm-enable-sjlj
v) So although TO.ExceptionModel = llvm::ExceptionHandling::Wasm; would take care of -exception-model=wasm I am not sure how ..... but we need to pass -wasm-enable-eh and/or -mattr=+exception-handling ..... so that we possibly end up with something like this I suppose
; RUN: llc < %s ....... -wasm-enable-eh -exception-model=wasm -mattr=+exception-handling,bulk-memory
I am not sure how but I see the 3rd parameter of createTargetMachine allows us to pass some features https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/TargetRegistry.h#L456
So we currently pass nothing here. So maybe we can update the code in AddModule to have this
llvm::TargetOptions TO = llvm::TargetOptions();
TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
PTU.TheModule->getTargetTriple(), "", "+wasm-enable-eh,+mattr=+exception-handling", TO, llvm::Reloc::Model::PIC_,
std::nullopt, llvm::CodeGenOptLevel::None, false);
Not sure it is this way you make use of these flags. I just know we can use -mllvm -wasm-enable-eh and -mexception-handling with emcc. Let me know if the above way is the correct way to put these to use.
Does it make sense to see it this way ?
I am guessing only the TargetMachine and the TargetOptions can play a role here. Cause after that its just the call to addPassesToEmitFile which I guess takes care of all the wasm related passes there are based on the optimization we use.
Apart from this I am not sure if how we build llvm or (the cxx_flags we pass there does make a difference)
I have been using this .... to build libclangInterpreter.a which is the only thing that I need to get clang-repl running in the browser.
# Configure step
emcmake cmake -S ../llvm -B . \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DCMAKE_PREFIX_PATH=$PREFIX \
-DLLVM_HOST_TRIPLE=wasm32-unknown-emscripten \
-DLLVM_TARGETS_TO_BUILD="WebAssembly" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_INCLUDE_BENCHMARKS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_ENABLE_LIBEDIT=OFF \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_THREADS=OFF \
-DLLVM_ENABLE_ZSTD=OFF \
-DLLVM_ENABLE_LIBXML2=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_BOOTSTRAP=OFF \
-DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -fexceptions -mexception-handling"
P.S: not sure passing -mexception-handling here makes sense.
Let me know if you think some changes need to be introduced here. Apart from this yeah, I need some help to look into the llvm IR going to the shared object step.
The issue here (IIUC) is not how you build llvm, but how llvm is building the side module.
https://github.com/emscripten-core/emscripten/issues/23442#issuecomment-2671495343
I am not sure how but I see the 3rd parameter of createTargetMachine allows us to pass some features https://github.com/llvm/llvm-project/blob/12f8ed58a039ff3a3365591203f76ae07a179215/llvm/include/llvm/MC/TargetRegistry.h#L456
So we currently pass nothing here. So maybe we can update the code in AddModule to have this
llvm::TargetOptions TO = llvm::TargetOptions(); TO.ExceptionModel = llvm::ExceptionHandling::Wasm; llvm::TargetMachine *TargetMachine = Target->createTargetMachine( PTU.TheModule->getTargetTriple(), "", "+wasm-enable-eh,+mattr=+exception-handling", TO, llvm::Reloc::Model::PIC_, std::nullopt, llvm::CodeGenOptLevel::None, false);Not sure it is this way you make use of these flags. I just know we can use
-mllvm -wasm-enable-ehand-mexception-handlingwith emcc. Let me know if the above way is the correct way to put these to use.Does it make sense to see it this way ?
I'm not sure how you are supposed to inject flags into the AddModule code, but sounds like you are on the right track, yes. You need to object file (module) to be built with exception handling support if you want to be able to catch exeptions.
I think @aheejin can help us out here cause I see some of his commits relevant to work on WasmEnableEH for llvm
Hey @aheejin we would appreciate some help here. The following is what we are trying to do.
- We are first trying to come up with a
Targetand aTargetMachinebased on the target triple which iswasm32-unknown-emscripten. So basically we want a WebAssembly-specific subclass of TargetMachine.
const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
PTU.TheModule->getTargetTriple(), ErrorString);
if (!Target) {
return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
llvm::inconvertibleErrorCode());
}
- We then want to enable exception handling support to catch exceptions. Hence we setup the TargetOptions and set the
ExceptionModeltollvm::ExceptionHandling::Wasm
llvm::TargetOptions TO = llvm::TargetOptions();
TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());
- Now our concern is this https://github.com/llvm/llvm-project/blob/cc675c635bf0016111050531e75f8082d0ea120b/llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp#L445-L449
If you see we have this check which we want to go across
if ((!WasmEnableEH && !WasmEnableSjLj) &&
TM->Options.ExceptionModel == ExceptionHandling::Wasm)
report_fatal_error(
"-exception-model=wasm only allowed with at least one of "
"-wasm-enable-eh or -wasm-enable-sjlj");
Hence in this case, I want to use WasmEnableEH which I think is directly linked with wasm-enable-eh
But the point is I don't realize how do we pass this flag or make use of it through our TargetMachine ? This is the code we're interested in
const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
PTU.TheModule->getTargetTriple(), ErrorString);
if (!Target) {
return llvm::make_error<llvm::StringError>("Failed to create Wasm Target: ",
llvm::inconvertibleErrorCode());
}
llvm::TargetOptions TO = llvm::TargetOptions();
TO.ExceptionModel = llvm::ExceptionHandling::Wasm;
llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
PTU.TheModule->getTargetTriple(), "", "", TO, llvm::Reloc::Model::PIC_);
PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());
And technically we aren't sure as to how/where we inject the wasm-enable-eh flag through our TargetMachine. Could you let us know how can the above be updated to take care of this ?
Possibly we would also like to pass -mattr=+exception-handling I suppose so that we end up with -wasm-enable-eh -exception-model=wasm -mattr=+exception-handling