wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

运行几次后总会随机出现错误:out of bounds memory access

Open kamylee opened this issue 1 year ago • 25 comments

我在wasm中使用了thread,运行几次后总会随机出现错误:out of bounds memory access 我在wasm app中创建了一个线程A用来做http server. 另外又创建了一个线程B pthread_create,用来与外部的websocket服务器通信,线程A一直运行稳定,最近加了线程B后总是随机出现out of bounds memory access的错误。 线程B中用到了native本地提供的api,里面用到了WebSocket, 线程B中用到了jsoncpp库,用来解析Json数据,

运行环境:windows 总是在正常接收几次数据后随机出现:out of bounds memory access的错误,这是wasi线程的问题吗? 频繁调用wasm_runtime_module_malloc wasm_runtime_module_free 就容易出错

CALL_STACK: #00: 0xd48a7bf4 - $f3908 #01: 0x0007 - free #02: 0x0036 - $f3898 #03: 0x0036 - $f274 #04: 0x003f - $f272 #05: 0x00cd - $f269 #06: 0x0064 - $f1068 #07: 0x005e - $f1064 #08: 0x0113 - $f1059 #09: 0x0045 - $f851 #10: 0x0038 - $f780 #11: 0x0151 - $f778 #12: 0x0036 - $f777 #13: 0x004d - $f839 #14: 0x0038 - $f1070 #15: 0x0036 - $f1067 #16: 0x003f - $f1063 #17: 0x00f2 - $f1059 #18: 0x008e - $f1059 #19: 0x008e - $f1059 #20: 0x008e - $f1059 #21: 0x0045 - $f851 #22: 0x0038 - $f780 #23: 0x0151 - $f778 #24: 0x0036 - $f777 #25: 0x0a9a - $f1986 #26: 0x08f5 - $f1977 #27: 0x6785 - $f1971 #28: 0x372c - $f1840 #29: 0x0625 - $f1827 #30: 0x1fd0 - $f1826 #31: 0x589d - $f1825 #32: 0x009e - $f1305

#00: 0xfff5a4de - free

kamylee avatar Oct 09 '24 10:10 kamylee

Hi, is thread B created by thread A (e.g. wasm app calls pthread_create) or created by host native itself? And how do you compile you wasm application, do you refer to the this document?

Note that if Build with libc-WASI, there may be two choices (1) disable malloc/free functions of libc wasi, by removing dlmalloc.o from libc.a, (2) use higher version of wasi-sdk (larger than 20.0) and export malloc/free functions in wasm app, by adding -Wl,--export=malloc -Wl,--export=free for /opt/wasi-sdk/bin/clang.

wenyongh avatar Oct 11 '24 01:10 wenyongh

I am calling the init() method in wasm through wasm_runtime_call_casm. Thread 1 and thread 2 are both created by the "init()" method.

wasi-sdk :I'm using wasi-sdk-22.0+m.

I complied my program like this:

cmake .. -DWAMR_BUILD_LIB_PTHREAD=1 -DWAMR_BUILD_LIB_WASI_THREADS=1 -DWAMR_BUILD_PLATFORM=windows -DWAMR_BUILD_MULTI_MODULE=1 -DWAMR_BUILD_DUMP_CALL_STACK=1 cmake --build . --config Release

Cmakelists.txt: `add_executable (HttpServer.wasm ${JSONCPP_SOURCES} "HttpServer.cpp" "Common.cpp" "thirdparty/llhttp/llhttp.c" "thirdparty/llhttp/http.c" "thirdparty/llhttp/api.c" "thirdparty/llhttp/WebSocket.cpp" "thirdparty/llhttp/base64/base64.cpp" "thirdparty/llhttp/sha1/sha1.cc" "tsdb_sample.c" #"kvdb_basic_sample.c" "kvdb_type_blob_sample.c" "kvdb_type_string_sample.c" "CAppProcessManage.cpp" "CMessageManage.cpp" "CMMap.cpp" ${FLASHDB_SRC} )

target_compile_options(HttpServer.wasm PRIVATE -pthread )#-g

TARGET_LINK_LIBRARIES(HttpServer.wasm #cjson pthread

)

target_link_options(HttpServer.wasm PRIVATE LINKER:--export=__heap_base LINKER:--export=__data_end LINKER:--export=malloc LINKER:--export=free LINKER:--export=init LINKER:--export=onMessage LINKER:--export=onDataRecv LINKER:--export=OnDestroy LINKER:--export=on_connect LINKER:--shared-memory #linear memory线性内存 包含:ncludes three parts, data area, auxiliary stack area and heap area. LINKER:--initial-memory=45875200,--max-memory=65536000 #900*65536=40,632,320 LINKER:-zstack-size=13107200 # aux stack 819200 1638400 LINKER:--no-check-features LINKER:--allow-undefined ) ` $ cmake -G "Unix Makefiles" -DWASI_SDK_PREFIX=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m -DCMAKE_TOOLCHAIN_FILE=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m/share/cmake/wasi-sdk.cmake -DCMAKE_SYSROOT=E:/WorkSpace/DownLoads/wasi-sdk-21.0.m-mingw/wasi-sdk-21.0+m/share/wasi-sysroot .. make

image

It doesn't completely stop working, but crashes after a few or a dozen times. I did some tests, and it's very likely that wasm_runtime_module_malloc and wasm_runtime_module_free are the cause.

kamylee avatar Oct 11 '24 03:10 kamylee

I am a little confused, do you use the same exec_env/module_inst to call the init() function, and are the two threads belongs to a sample wasm instance (e.g. use the same shared linear memory)?

Could you track which line in C source code causes the exception first? Please refer to https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/samples/debug-tools.

wenyongh avatar Oct 11 '24 04:10 wenyongh

Yes,i use the same exec_env/module_inst to call the init() function. These two threads belong to the same wasm instance. Can't I create two threads in init()? I'm trying to use debug-tools.

kamylee avatar Oct 11 '24 06:10 kamylee

Just curious why not let init() create two threads one time. Does the init() call pthread_create() to create the thread? And how to pass different thread callbacks for thread A and thread B, I guess you pass a flag to init() and init() passes different thread callback to thread A and B according to the flag? So there are three threads eventually.

wenyongh avatar Oct 11 '24 06:10 wenyongh

My code is as follows: ` void* Thread_Tcp_loop(void* Param) { ... } static void* Thread_Test(void* Param) { ... } extern "C" int init(char* strBuf, int bufLen) { ... pthread_t tid; int ret = pthread_create(&tid, NULL, &Thread_Tcp_loop, (void*)iCurID); if (ret) { printf("failed to spawn thread: %s", strerror(ret)); } pthread_detach(tid);

pthread_t tid1; int ret1 = pthread_create(&tid1, NULL, &Thread_Test, (void*)iCurID); if (ret1) { printf("failed to spawn thread: %s", strerror(ret1)); } pthread_detach(tid1); ... return 1; } `

kamylee avatar Oct 11 '24 07:10 kamylee

Hi, so you create two threads in init(), then you should only call init() one time in the host native? And could you remove pthread_detach(tid); and pthread_detach(tid1);? A little confused why detach the threads here.

Another issue is that could you wait some time after

ret = pthread_create(&tid, NULL, &Thread_Tcp_loop, (void*)iCurID);
if (ret) {
printf("failed to spawn thread: %s", strerror(ret));
}

e.g., add usleep(..) or pthread_cond_wait to wait until the loop actually launches?

wenyongh avatar Oct 11 '24 08:10 wenyongh

yes there are two threads in init() and only init() is called only once.

Because I need the init() function to return immediately.so call pthread_detach.
After calling init(), the host program enters a message receiving loop. Once it receives a message from another WASP app, it calls wasm_runtime_module_malloc to pass parameters to the onMessage function inside the wasm.

actually,i added sleep(): image image

And in both of these two threads, the loop does not exit.

kamylee avatar Oct 11 '24 08:10 kamylee

OK, seems it isn't caused by pthread_detach, but it may be better to put the first pthread_detach after NativeApp_Sleep(10) and put the second pthread_detach after NativeApi_sleep(100).

wenyongh avatar Oct 11 '24 08:10 wenyongh

image I simply use wasm_runtime_module_malloc/free, and there is still a chance of crashing, but it doesn't crash if malloc is not executed.

kamylee avatar Oct 11 '24 09:10 kamylee

Not sure whether it is caused by wasi-sdk (its libc bytecode of malloc function), or is caused by wamr, could you try compile the wasm app with wasi-sdk-20+threads?

Another way you can try is to remove the dlmalloc.o from wasi-sdk's libc.a, so wasm_runtime_module_malloc will allocate memory from wamr's app heap instead of libc's malloc, please refer to pthread_library.md: image

And if you can, had better dump the call stack of wasm app.

wenyongh avatar Oct 11 '24 11:10 wenyongh

i have upgraded the wasi-sdk from 21 to 24(wasi-sdk-24.0-x86_64-windows)。 image After following the steps above, libc.a became smaller, the wasm compiled successfully, but there was an error during execution. image

kamylee avatar Oct 12 '24 07:10 kamylee

Hi, is it caused by wasm_runtime_module_malloc returning 0? Do yo pass --heap-size=n to iwasm, or if you are not using iwasm, could you pass host_managed_heap_size with a value larger than 0 to wasm_runtime_instantiate: https://github.com/bytecodealliance/wasm-micro-runtime/blob/36d438051ec9955a7a88c79960625dda71638967/core/iwasm/include/wasm_export.h#L696-L700

wenyongh avatar Oct 12 '24 07:10 wenyongh

yes,this is exactly how I do it. image

kamylee avatar Oct 12 '24 08:10 kamylee

基本确定是线程方面造成的,因为我在调用wasm内的init之前调用wasm_runtime_module_malloc/free达6000次都不崩溃,但放在调用init()之后,并且init里只启动1个线程,在循环6000次期间,如果这个线程里收到了数据,就一定会崩溃,似乎堆栈被破坏了。

for (int i = 0; i < 6000; i++)
{
	printf("wasm_runtime_module_malloc time %d\n", i);
	//在wasm中分配内存
	char* bufferTemp = NULL;
	uint64_t wasmBuffer = wasm_runtime_module_malloc(wasm_module_inst, 100, (void**)&bufferTemp);
	//wasmBufferName = wasm_runtime_module_dup_data(wasm_module_inst, "dasdf", 100);
	if (wasmBuffer != 0)
	{
		strncpy(bufferTemp, "{\"key\":\"value1\"}", 100);
		uint32 argv[2];
		argv[0] = wasmBuffer;     /* pass the buffer address for WASM space */
		argv[1] = 100;
		if (!wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv))
		{
			const char* errInfo = wasm_runtime_get_exception(wasm_module_inst);
			DLOG << "Native wasm_runtime_call_wasm err msg:" << errInfo;
		}
		printf("wasm_runtime_module_free\n");
		wasm_runtime_module_free(wasm_module_inst, wasmBuffer);
	}
}//6000次不崩溃
...
wasm_runtime_call_wasm(exec_env, init, 2, argv);//内有1个线程,用于接收socket数据
...
如果循环放在这里,当init()内创建的**线程A**收到数据后,就一定会崩溃。

线程A调用NativeApiSocket创建了socket 和使用accept来监听新的连接,accept目前是阻塞模式,当有新的连接接入时会调用wasm内的onConnect(socketId)并传入参数socketId,在onConnect又创建了线程用于调用native本地NativeApiRecv接收当前socketId的数据。我设的线程上限是4个,并未超过线程数量上限。

那我是不能在wasm用线程了吗?还是有其他办法解决这个问题吗?

kamylee avatar Oct 14 '24 01:10 kamylee

What onDataRecv does? Will wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv) send data to other thread, and other thread accesses it after this thread frees the data (module_free(wasm_module_inst, wasmBuffer))?

BTW, do you test it with dlmalloc.o removed from libc.a now? And could you upload the wasm file?

wenyongh avatar Oct 14 '24 02:10 wenyongh

1、onDataRecv内部对收到的参数使用jsoncpp进行解析后做一起字符比较,然后会调用NativeApiSend(socketId ,string)发送数据,过程中并没有将数据发给其他线程,并且我还特意将参数复制到本地局部变量:

extern "C" void onDataRecv(char* message, int msgLen/*int fdId*//*,char* strFd*/)
{
	printf("onDataRecv\n");
	char strMsgTemp[1024] = { 0 };
	strncpy(strMsgTemp, message, msgLen);
	...
	Json::Value root;
	Json::Reader reader(Json::Features::strictMode());
	if (!reader.parse(strMsg.c_str(), root))
	{
		ELOG("解析出错 strMsg:%s", /*strMsg.c_str()*/strMsgTemp);
		return;
	}  
	printf("key:%s\n", root["key"].asCString());
	...
	NativeApi_TcpServerSendRaw();//发给客户端,单纯的调用send()。
}

所以,应该是在这个onDataRecv结束后才wasm_runtime_module_free(wasm_module_inst, wasmBuffer);

2、我已经试过移除dlmalloc.o,但是会出现新的错误,上面已经发过贴了 image 我又还原了原本的libc.a。

3、我的wasm文件: HttpServer.zip

kamylee avatar Oct 14 '24 03:10 kamylee

Seems that __wasm_call_ctors isn't exported and executed to initialize the c++ class/struct related globals. Could you add -Wl,--export=__wasm_call_ctors, and then call it before calling the init() function, somewhat like:

wasm_function_isnt_t call_ctors_func = wasm_runtime_lookup(module_inst, "__wasm_call_ctors");
wasm_runtime_call_wasm(exec_env, call_ctors_func, 0, NULL);

wenyongh avatar Oct 14 '24 03:10 wenyongh

已经加上__wasm_call_ctors,并调用成功wasm_runtime_call_wasm(exec_env, call_ctors_func, 0, NULL); 测试: 1、本地还是循环调用mall/feel wasm_runtime_call_wasm(exec_env, onDataRecv, 2, argv) 6000次, 2、wasm里的线程A创建socket tcp服务器接收外部数据OnReceiveData。外部快速多次的向他发送数据,

有3个情况: 1、当外部不发数据,仅循环6000次时,一定不会崩溃 2、当主机调用的onDataRecv和 线程A创建的线程B里的OnReceiveData函数 仅调用printf后直接返回,不会崩溃。 3、当onDataRecv或OnReceiveData中有Json解析的过程,就一定会崩溃。 是不是因为:Json中有对malloc/free的调用,而mallo不是原子的,线程不安全。两个2线程在分配内存时一旦发生越界了就导致崩溃了。 有可能是这种情况吗?

kamylee avatar Oct 14 '24 07:10 kamylee

I am not sure whether there is still issue in the latest wasi-sdk, but maybe you can try building wasi-libc by yourself:

git clone https://github.com/WebAssembly/wasi-libc
cd wasi-libc
make -j AR=/opt/wasi-sdk/bin/llvm-ar NM=/opt/wasi-sdk/bin/llvm-nm CC=/opt/wasi-sdk/bin/clang THREAD_MODEL=posix
# the sysroot folder will be generated

And then compile the wasm app with the sysroot:

/opt/wasi-sdk/bin/clang ... --sysroot=<path/to/wasi-libc/sysroot>

Refer to the nightly CI which tests some wasi-threads cases: https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/.github/workflows/nightly_run.yml#L692-L699 https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/.github/workflows/nightly_run.yml#L743-L751

wenyongh avatar Oct 14 '24 08:10 wenyongh

我再次尝试在libc.a中去掉dlmalloc.o,并把pthread.h复制到这里\share\wasi-sysroot\include\wasm32-wasi image 注释掉这2行 image

就不再崩溃了,我就把init()里的第2个线程启用,但提示这个: image

image

target_link_options(HttpServer.wasm PRIVATE 
	LINKER:--export=__heap_base
	LINKER:--export=__data_end
#	LINKER:--export=malloc
#	LINKER:--export=free
	LINKER:--export=__wasm_call_ctors
	LINKER:--export=init
	LINKER:--export=onMessage
	LINKER:--export=onDataRecv	
	LINKER:--export=OnDestroy
	LINKER:--export=on_connect
	LINKER:--shared-memory
	LINKER:--initial-memory=45875200,--max-memory=65536000 #900*65536=40,632,320 
	LINKER:-zstack-size=10485760 
	LINKER:--no-check-features
	LINKER:--allow-undefined
)

是不是哪里设置的大小不合适?

kamylee avatar Oct 14 '24 09:10 kamylee

Could you please try removing -DWAMR_BUILD_LIB_WASI_THREADS=1 (or set it to 0) and rebuild wamr first? And enlarge -zstack-size=10485760 in target_link_options and call wasm_runtime_set_max_thread_num(n) with a larger n if needed.

wenyongh avatar Oct 14 '24 10:10 wenyongh

我加了-DWAMR_BUILD_LIB_WASI_THREADS=0, -zstack-size=20971520 wasm_runtime_create_exec_env(wasm_module_inst, 5 * 1024 * 1024); wasm_runtime_set_max_thread_num(4); 目前已经运行正常了,非常感谢!

我现在相当于是用的wamr内部的pthread而不是wasi里的pthread对吗?

kamylee avatar Oct 15 '24 02:10 kamylee

Welcome, it is great that it works! Yes, you use WAMR lib-pthread but not wasi-threads, since the wasm app imports pthread_create and other pthread_xxx APIs, but not import thread-spawn. Refer to below links for more details:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/pthread_impls.md https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/pthread_library.md https://bytecodealliance.github.io/wamr.dev/blog/introduction-to-wamr-wasi-threads/

wenyongh avatar Oct 15 '24 03:10 wenyongh

image 我现在运行很正常,但是有这些提示warning,应该不影响吧? 是让他继续显示,还是有办法可以不显示?

kamylee avatar Oct 18 '24 00:10 kamylee

It is because that these import functions are not linked, if you want, you can implement the related native API wrappers and register them to runtime with wasm_runtime_register_natives. But it doesn't matter if they are not called, and when they are actually called during execution, runtime will throw exception like failed to call unlinked import function xxx.

wenyongh avatar Oct 23 '24 08:10 wenyongh

Thanks very much!

kamylee avatar Oct 25 '24 07:10 kamylee