wasm-micro-runtime
wasm-micro-runtime copied to clipboard
运行几次后总会随机出现错误:out of bounds memory access
我在wasm中使用了thread,运行几次后总会随机出现错误:out of bounds memory access 我在wasm app中创建了一个线程A用来做http server. 另外又创建了一个线程B pthread_create,用来与外部的websocket服务器通信,线程A一直运行稳定,最近加了线程B后总是随机出现out of bounds memory access的错误。 线程B中用到了native本地提供的api,里面用到了WebSocket, 线程B中用到了jsoncpp库,用来解析Json数据,
运行环境:windows 总是在正常接收几次数据后随机出现:out of bounds memory access的错误,这是wasi线程的问题吗? 频繁调用wasm_runtime_module_malloc wasm_runtime_module_free 就容易出错
CALL_STACK: #00: 0xd48a7bf4 - $f3908 #01: 0x0007 - free #02: 0x0036 - $f3898 #03: 0x0036 - $f274 #04: 0x003f - $f272 #05: 0x00cd - $f269 #06: 0x0064 - $f1068 #07: 0x005e - $f1064 #08: 0x0113 - $f1059 #09: 0x0045 - $f851 #10: 0x0038 - $f780 #11: 0x0151 - $f778 #12: 0x0036 - $f777 #13: 0x004d - $f839 #14: 0x0038 - $f1070 #15: 0x0036 - $f1067 #16: 0x003f - $f1063 #17: 0x00f2 - $f1059 #18: 0x008e - $f1059 #19: 0x008e - $f1059 #20: 0x008e - $f1059 #21: 0x0045 - $f851 #22: 0x0038 - $f780 #23: 0x0151 - $f778 #24: 0x0036 - $f777 #25: 0x0a9a - $f1986 #26: 0x08f5 - $f1977 #27: 0x6785 - $f1971 #28: 0x372c - $f1840 #29: 0x0625 - $f1827 #30: 0x1fd0 - $f1826 #31: 0x589d - $f1825 #32: 0x009e - $f1305
#00: 0xfff5a4de - free
Hi, is thread B created by thread A (e.g. wasm app calls pthread_create) or created by host native itself? And how do you compile you wasm application, do you refer to the this document?
Note that if Build with libc-WASI, there may be two choices (1) disable malloc/free functions of libc wasi, by removing dlmalloc.o from libc.a, (2) use higher version of wasi-sdk (larger than 20.0) and export malloc/free functions in wasm app, by adding -Wl,--export=malloc -Wl,--export=free for /opt/wasi-sdk/bin/clang.
I am calling the init() method in wasm through wasm_runtime_call_casm. Thread 1 and thread 2 are both created by the "init()" method.
wasi-sdk :I'm using wasi-sdk-22.0+m.
I complied my program like this:
cmake .. -DWAMR_BUILD_LIB_PTHREAD=1 -DWAMR_BUILD_LIB_WASI_THREADS=1 -DWAMR_BUILD_PLATFORM=windows -DWAMR_BUILD_MULTI_MODULE=1 -DWAMR_BUILD_DUMP_CALL_STACK=1 cmake --build . --config Release
Cmakelists.txt: `add_executable (HttpServer.wasm ${JSONCPP_SOURCES} "HttpServer.cpp" "Common.cpp" "thirdparty/llhttp/llhttp.c" "thirdparty/llhttp/http.c" "thirdparty/llhttp/api.c" "thirdparty/llhttp/WebSocket.cpp" "thirdparty/llhttp/base64/base64.cpp" "thirdparty/llhttp/sha1/sha1.cc" "tsdb_sample.c" #"kvdb_basic_sample.c" "kvdb_type_blob_sample.c" "kvdb_type_string_sample.c" "CAppProcessManage.cpp" "CMessageManage.cpp" "CMMap.cpp" ${FLASHDB_SRC} )
target_compile_options(HttpServer.wasm PRIVATE -pthread )#-g
TARGET_LINK_LIBRARIES(HttpServer.wasm #cjson pthread
)
target_link_options(HttpServer.wasm PRIVATE LINKER:--export=__heap_base LINKER:--export=__data_end LINKER:--export=malloc LINKER:--export=free LINKER:--export=init LINKER:--export=onMessage LINKER:--export=onDataRecv LINKER:--export=OnDestroy LINKER:--export=on_connect LINKER:--shared-memory #linear memory线性内存 包含:ncludes three parts, data area, auxiliary stack area and heap area. LINKER:--initial-memory=45875200,--max-memory=65536000 #900*65536=40,632,320 LINKER:-zstack-size=13107200 # aux stack 819200 1638400 LINKER:--no-check-features LINKER:--allow-undefined ) ` $ cmake -G "Unix Makefiles" -DWASI_SDK_PREFIX=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m -DCMAKE_TOOLCHAIN_FILE=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m/share/cmake/wasi-sdk.cmake -DCMAKE_SYSROOT=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m/share/wasi-sysroot .. make
It doesn't completely stop working, but crashes after a few or a dozen times. I did some tests, and it's very likely that wasm_runtime_module_malloc and wasm_runtime_module_free are the cause.
I am a little confused, do you use the same exec_env/module_inst to call the init() function, and are the two threads belongs to a sample wasm instance (e.g. use the same shared linear memory)?
Could you track which line in C source code causes the exception first? Please refer to https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/samples/debug-tools.
Yes,i use the same exec_env/module_inst to call the init() function. These two threads belong to the same wasm instance. Can't I create two threads in init()? I'm trying to use debug-tools.
Just curious why not let init() create two threads one time. Does the init() call pthread_create() to create the thread? And how to pass different thread callbacks for thread A and thread B, I guess you pass a flag to init() and init() passes different thread callback to thread A and B according to the flag? So there are three threads eventually.
My code is as follows: ` void* Thread_Tcp_loop(void* Param) { ... } static void* Thread_Test(void* Param) { ... } extern "C" int init(char* strBuf, int bufLen) { ... pthread_t tid; int ret = pthread_create(&tid, NULL, &Thread_Tcp_loop, (void*)iCurID); if (ret) { printf("failed to spawn thread: %s", strerror(ret)); } pthread_detach(tid);
pthread_t tid1; int ret1 = pthread_create(&tid1, NULL, &Thread_Test, (void*)iCurID); if (ret1) { printf("failed to spawn thread: %s", strerror(ret1)); } pthread_detach(tid1); ... return 1; } `
Hi, so you create two threads in init(), then you should only call init() one time in the host native? And could you remove pthread_detach(tid); and pthread_detach(tid1);? A little confused why detach the threads here.
Another issue is that could you wait some time after
ret = pthread_create(&tid, NULL, &Thread_Tcp_loop, (void*)iCurID);
if (ret) {
printf("failed to spawn thread: %s", strerror(ret));
}
e.g., add usleep(..) or pthread_cond_wait to wait until the loop actually launches?
yes there are two threads in init() and only init() is called only once.
Because I need the init() function to return immediately.so call pthread_detach.
After calling init(), the host program enters a message receiving loop. Once it receives a message from another WASP app, it calls wasm_runtime_module_malloc to pass parameters to the onMessage function inside the wasm.
actually,i added sleep():
And in both of these two threads, the loop does not exit.
OK, seems it isn't caused by pthread_detach, but it may be better to put the first pthread_detach after NativeApp_Sleep(10) and put the second pthread_detach after NativeApi_sleep(100).
I simply use wasm_runtime_module_malloc/free, and there is still a chance of crashing, but it doesn't crash if malloc is not executed.
Not sure whether it is caused by wasi-sdk (its libc bytecode of malloc function), or is caused by wamr, could you try compile the wasm app with wasi-sdk-20+threads?
Another way you can try is to remove the dlmalloc.o from wasi-sdk's libc.a, so wasm_runtime_module_malloc will allocate memory from wamr's app heap instead of libc's malloc, please refer to pthread_library.md:
And if you can, had better dump the call stack of wasm app.
i have upgraded the wasi-sdk from 21 to 24(wasi-sdk-24.0-x86_64-windows)。
After following the steps above, libc.a became smaller, the wasm compiled successfully, but there was an error during execution.
Hi, is it caused by wasm_runtime_module_malloc returning 0? Do yo pass --heap-size=n to iwasm, or if you are not using iwasm, could you pass host_managed_heap_size with a value larger than 0 to wasm_runtime_instantiate:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/36d438051ec9955a7a88c79960625dda71638967/core/iwasm/include/wasm_export.h#L696-L700
yes,this is exactly how I do it.
基本确定是线程方面造成的,因为我在调用wasm内的init之前调用wasm_runtime_module_malloc/free达6000次都不崩溃,但放在调用init()之后,并且init里只启动1个线程,在循环6000次期间,如果这个线程里收到了数据,就一定会崩溃,似乎堆栈被破坏了。
for (int i = 0; i < 6000; i++)
{
printf("wasm_runtime_module_malloc time %d\n", i);
//在wasm中分配内存
char* bufferTemp = NULL;
uint64_t wasmBuffer = wasm_runtime_module_malloc(wasm_module_inst, 100, (void**)&bufferTemp);
//wasmBufferName = wasm_runtime_module_dup_data(wasm_module_inst, "dasdf", 100);
if (wasmBuffer != 0)
{
strncpy(bufferTemp, "{\"key\":\"value1\"}", 100);
uint32 argv[2];
argv[0] = wasmBuffer; /* pass the buffer address for WASM space */
argv[1] = 100;
if (!wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv))
{
const char* errInfo = wasm_runtime_get_exception(wasm_module_inst);
DLOG << "Native wasm_runtime_call_wasm err msg:" << errInfo;
}
printf("wasm_runtime_module_free\n");
wasm_runtime_module_free(wasm_module_inst, wasmBuffer);
}
}//6000次不崩溃
...
wasm_runtime_call_wasm(exec_env, init, 2, argv);//内有1个线程,用于接收socket数据
...
如果循环放在这里,当init()内创建的**线程A**收到数据后,就一定会崩溃。
线程A调用NativeApiSocket创建了socket 和使用accept来监听新的连接,accept目前是阻塞模式,当有新的连接接入时会调用wasm内的onConnect(socketId)并传入参数socketId,在onConnect又创建了线程用于调用native本地NativeApiRecv接收当前socketId的数据。我设的线程上限是4个,并未超过线程数量上限。
那我是不能在wasm用线程了吗?还是有其他办法解决这个问题吗?
What onDataRecv does? Will wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv) send data to other thread, and other thread accesses it after this thread frees the data (module_free(wasm_module_inst, wasmBuffer))?
BTW, do you test it with dlmalloc.o removed from libc.a now? And could you upload the wasm file?
1、onDataRecv内部对收到的参数使用jsoncpp进行解析后做一起字符比较,然后会调用NativeApiSend(socketId ,string)发送数据,过程中并没有将数据发给其他线程,并且我还特意将参数复制到本地局部变量:
extern "C" void onDataRecv(char* message, int msgLen/*int fdId*//*,char* strFd*/)
{
printf("onDataRecv\n");
char strMsgTemp[1024] = { 0 };
strncpy(strMsgTemp, message, msgLen);
...
Json::Value root;
Json::Reader reader(Json::Features::strictMode());
if (!reader.parse(strMsg.c_str(), root))
{
ELOG("解析出错 strMsg:%s", /*strMsg.c_str()*/strMsgTemp);
return;
}
printf("key:%s\n", root["key"].asCString());
...
NativeApi_TcpServerSendRaw();//发给客户端,单纯的调用send()。
}
所以,应该是在这个onDataRecv结束后才wasm_runtime_module_free(wasm_module_inst, wasmBuffer);
2、我已经试过移除dlmalloc.o,但是会出现新的错误,上面已经发过贴了
我又还原了原本的libc.a。
3、我的wasm文件: HttpServer.zip
Seems that __wasm_call_ctors isn't exported and executed to initialize the c++ class/struct related globals. Could you add -Wl,--export=__wasm_call_ctors, and then call it before calling the init() function, somewhat like:
wasm_function_isnt_t call_ctors_func = wasm_runtime_lookup(module_inst, "__wasm_call_ctors");
wasm_runtime_call_wasm(exec_env, call_ctors_func, 0, NULL);
已经加上__wasm_call_ctors,并调用成功wasm_runtime_call_wasm(exec_env, call_ctors_func, 0, NULL); 测试: 1、本地还是循环调用mall/feel wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv) 6000次, 2、wasm里的线程A创建socket tcp服务器接收外部数据OnReceiveData。外部快速多次的向他发送数据,
有3个情况: 1、当外部不发数据,仅循环6000次时,一定不会崩溃 2、当主机调用的onDataRecv和 线程A创建的线程B里的OnReceiveData函数 仅调用printf后直接返回,不会崩溃。 3、当onDataRecv或OnReceiveData中有Json解析的过程,就一定会崩溃。 是不是因为:Json中有对malloc/free的调用,而mallo不是原子的,线程不安全。两个2线程在分配内存时一旦发生越界了就导致崩溃了。 有可能是这种情况吗?
I am not sure whether there is still issue in the latest wasi-sdk, but maybe you can try building wasi-libc by yourself:
git clone https://github.com/WebAssembly/wasi-libc
cd wasi-libc
make -j AR=/opt/wasi-sdk/bin/llvm-ar NM=/opt/wasi-sdk/bin/llvm-nm CC=/opt/wasi-sdk/bin/clang THREAD_MODEL=posix
# the sysroot folder will be generated
And then compile the wasm app with the sysroot:
/opt/wasi-sdk/bin/clang ... --sysroot=<path/to/wasi-libc/sysroot>
Refer to the nightly CI which tests some wasi-threads cases: https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/.github/workflows/nightly_run.yml#L692-L699 https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/.github/workflows/nightly_run.yml#L743-L751
我再次尝试在libc.a中去掉dlmalloc.o,并把pthread.h复制到这里\share\wasi-sysroot\include\wasm32-wasi
注释掉这2行
就不再崩溃了,我就把init()里的第2个线程启用,但提示这个:
target_link_options(HttpServer.wasm PRIVATE
LINKER:--export=__heap_base
LINKER:--export=__data_end
# LINKER:--export=malloc
# LINKER:--export=free
LINKER:--export=__wasm_call_ctors
LINKER:--export=init
LINKER:--export=onMessage
LINKER:--export=onDataRecv
LINKER:--export=OnDestroy
LINKER:--export=on_connect
LINKER:--shared-memory
LINKER:--initial-memory=45875200,--max-memory=65536000 #900*65536=40,632,320
LINKER:-zstack-size=10485760
LINKER:--no-check-features
LINKER:--allow-undefined
)
是不是哪里设置的大小不合适?
Could you please try removing -DWAMR_BUILD_LIB_WASI_THREADS=1 (or set it to 0) and rebuild wamr first? And enlarge -zstack-size=10485760 in target_link_options and call wasm_runtime_set_max_thread_num(n) with a larger n if needed.
我加了-DWAMR_BUILD_LIB_WASI_THREADS=0, -zstack-size=20971520 wasm_runtime_create_exec_env(wasm_module_inst, 5 * 1024 * 1024); wasm_runtime_set_max_thread_num(4); 目前已经运行正常了,非常感谢!
我现在相当于是用的wamr内部的pthread而不是wasi里的pthread对吗?
Welcome, it is great that it works!
Yes, you use WAMR lib-pthread but not wasi-threads, since the wasm app imports pthread_create and other pthread_xxx APIs, but not import thread-spawn. Refer to below links for more details:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/pthread_impls.md https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/pthread_library.md https://bytecodealliance.github.io/wamr.dev/blog/introduction-to-wamr-wasi-threads/
我现在运行很正常,但是有这些提示warning,应该不影响吧? 是让他继续显示,还是有办法可以不显示?
It is because that these import functions are not linked, if you want, you can implement the related native API wrappers and register them to runtime with wasm_runtime_register_natives. But it doesn't matter if they are not called, and when they are actually called during execution, runtime will throw exception like failed to call unlinked import function xxx.
Thanks very much!