wasmtime icon indicating copy to clipboard operation
wasmtime copied to clipboard

Wasmtime hangs when doing file I/O in a multi threaded app

Open dbaeumer opened this issue 1 year ago • 3 comments

Executing a wasm application on wasmtime that has multiple threads where on thread does file I/O results in a hang on the first file I/O operation.

Disclaimer: I am not a Rust expert. But since the application runs correctly as a OS process I assume that the Rust code is OK. I am also not sure if this is a Rust toolchain problem or a wasmtime problem.

Test Case

Clone https://github.com/dbaeumer/wasm-threads.git

Steps to Reproduce

  • npm run build:wasm to build the wasm file
  • npm run run:wasmtime to execute the wasm file in wasmtime

Expected Results

The application doesn't hang on the fs::File::open call

I expect the same result as executing:

npm run build:os npm run run:os

Actual Results

The application does hang.

Versions and Environment

Rust:

Default host: x86_64-unknown-linux-gnu
rustup home:  /xxxx/.rustup

installed targets for active toolchain
--------------------------------------

wasm32-unknown-unknown
wasm32-wasi
wasm32-wasi-preview1-threads
x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.77.2 (25ef9e3d8 2024-04-09)

Wasmtime: wasmtime-cli 19.0.2 (040ec9209 2024-04-11)

Operating system:

Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

dbaeumer avatar Apr 17 '24 09:04 dbaeumer

Using gdb/logging it looks like the main thread is blocked in memory.atomic.wait32(addr=0x104198, expected=0x80000002, timeout=None). Given that I'm not sure that the issue is in Wasmtime and this looks like a wasi-libc issue perhaps? (e.g. something perhaps related to locking in the initialization of files and such.

alexcrichton avatar Apr 17 '24 14:04 alexcrichton

@abrown this seems relevant to your interests :)

tschneidereit avatar Apr 17 '24 14:04 tschneidereit

I ran @dbaeumer's code today on the latest version of Wasmtime (@dbaeumer, thanks for the great reproducer). I believe the hang is due to incorrect locking in wasi-libc. This is not a Wasmtime issue, but I'll explain what I figured out here:

When running with WASMTIME_LOG=wasmtime_runtime::threads=trace, I see one memory.atomic.wait that is not corresponded elsewhere:

2024-04-26T19:11:26.438492Z TRACE wasmtime_runtime::threads::shared_memory: memory.atomic.wait32(addr=0x103f2c, expected=2147483650, timeout=None)

The stack trace at this instruction looks like:

__wait
__lock
internal_register_preopened_fd_unlocked
__wasilibc_populate_preopens
__wasilibc_find_abspath
__wasilibc_find_relpath_alloc
__wasilibc_find_relpath
std::sys::pal::wasi::fs::open_parent::{{closure}}::h2a891633185b03b0
std::sys::pal::wasi::fs::File::open::h57333908bc41373d
std::fs::OpenOptions::_open::h0ba6508e7f3dab94
std::fs::OpenOptions::open::h917f7ccf95a22000
std::fs::File::open::h2bd1932e28feed10
wasm_threads::main::hd67815b1e283a1b0
core::ops::function::FnOnce::call_once::h0dba6a9e804e86ee
std::sys_common::backtrace::__rust_begin_short_backtrace::h884974562e00ebc8
std::rt::lang_start::{{closure}}::h79de5d68ac140761
std::rt::lang_start_internal::h301b28581f6245a7
std::rt::lang_start::h8217158c4515bb59
__main_void
_start

This points to a problem in internal_register_preopened_fd_unlocked. From the file name, it would appear that no locking should happen here because its parent, __wasilibc_populate_preopens, has already acquired the lock. But, in fact, resize attempts to reacquire it despite it being held in the parent.

Looking at Git blame, I added the additional locking very early on in the wasi-threads implementation effort (#311) and it looks like the mistake comes from not understanding how these preopens work correctly. I believe we can safely remove the lock in resize since it is not used elsewhere but I would like to reproduce end-to-end with the Rust toolchain; @alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?


EDIT: scratch the Git blame analysis; looking at this more, I think this was actually introduced in a refactoring in #408. @sunfishcode, it probably wasn't clear that resize retakes the lock? But it should be fine to remove it, right?

abrown avatar Apr 26 '24 21:04 abrown

@alexcrichton, can you remind me how to use cargo with a modified version of wasi-libc?

The "easiest" way is perhaps to:

  • Check-out wasi-sdk
  • Update/edit the wasi-libc submodule with your changes
  • Build wasi-sdk
  • Set WASI_SDK_PATH and build the Rust target

You can probably also finagle this with various sysroot flags and Rust configuration as well and get away with only building wasi-libc if you'd like to.

alexcrichton avatar May 02 '24 06:05 alexcrichton

Ok, here's how I verified that WebAssembly/wasi-libc#491 actually fixes this issue:

  • built wasi-libc with threads support: make THREAD_MODEL=posix
  • in each of the sysroot/* directories, I had to "rename" the target to what Rust believes it to be: ln -s wasm32-wasi-threads wasm32-wasip1-threads
  • with the rust repository cloned, configure config.toml to point to the modified wasi-libc:
    lld = true
    [target.wasm32-wasip1-threads]
    wasi-root = ".../wasi-libc/sysroot"
    
  • build rustc: ./x.py build --stage 1 --target wasm32-wasip1-threads
  • inform rustup of the new compiler: rustup toolchain link stage1 build/x86_64-unknown-linux-gnu/stage1
  • with @dbaeumer's repository cloned, build the reproducer with the modified rustc and wasi-libc: rustc +stage1 -v --target=wasm32-wasip1-threads src/main.rs
  • run the compiled reproducer with any recent Wasmtime:
    $ wasmtime --dir .::. --wasi threads=y --wasm threads=y main.wasm
    Before file open
    After file open
    hi number 1 from the spawned thread!
    hi number 2 from the spawned thread!
    hi number 3 from the spawned thread!
    hi number 4 from the spawned thread!
    hi number 5 from the spawned thread!
    hi number 6 from the spawned thread!
    hi number 7 from the spawned thread!
    hi number 8 from the spawned thread!
    hi number 9 from the spawned thread!
    Thread finished
    

Having run these steps, I'm a bit more confident that WebAssembly/wasi-libc#491 does not miss anything and, once that merges, I propose we close this.

abrown avatar May 02 '24 21:05 abrown