zig
zig copied to clipboard
WASI std lib tests fail when the host is Windows (mostly `fs`-related)
Zig Version
0.10.0-dev.4281+ebeadf9d9
Steps to Reproduce
These tests pass when the host is Linux, but fail when the host is Windows. 6 tests are failing within std/fs/test.zig
and 10 are failing across the entire std lib test suite. Not sure exactly what the other 4 are. I'm testing with wasmtime-v1.0.1-x86_64-windows
.
Worth noting especially that Dir.deleteTree
and Dir.deleteTreeMinStackSize
are consistently failing with error.AccessDenied
, so all the fs tests that use testing.TmpDir
are failing to clean up their directories too.
It's unclear to me exactly where solutions should happen here: are these Zig bugs, wasmtime bugs, or problems with the WASI spec?
Expected Behavior
All tests pass
Actual Behavior
When running just std/fs/test.zig
:
>zig test lib/std/fs/test.zig --zig-lib-dir lib --main-pkg-path lib/std --test-cmd wasmtime --test-cmd --dir=. --test-cmd --allow-unknown-exports --test-cmd-bin -target wasm32-wasi
expected error.IsDir, found error.AccessDenied
expected error.PathAlreadyExists, found error.AccessDenied
expected error.IsDir, found error.AccessDenied
23 passed; 17 skipped; 6 failed.
test failureError: failed to run main module `C:\Users\Ryan\Programming\Zig\zig\zig-cache\o\b745ba0bfad688b8844d343fe774832d\test.wasm`
Caused by:
0: failed to invoke command default
1: wasm trap: wasm `unreachable` instruction executed
wasm backtrace:
0: 0xc725 - os.abort
at lib/lib\std/os.zig:500:9
1: 0x1314 - builtin.default_panic
at lib/lib\std/builtin.zig:784:25
2: 0x3c6b5 - test_runner.main
at lib/lib/test_runner.zig:13:30
3: 0x3c639 - start.callMain
at lib/lib\std/start.zig:568:22
- _start
at lib/lib\std/start.zig:241:42
error: the following test command failed with exit code 3:
wasmtime --dir=. --allow-unknown-exports C:\Users\Ryan\Programming\Zig\zig\zig-cache\o\b745ba0bfad688b8844d343fe774832d\test.wasm
When running all std lib tests:
>zig test lib/std/std.zig --zig-lib-dir lib --main-pkg-path lib/std --test-cmd wasmtime --test-cmd --dir=. --test-cmd --allow-unknown-exports --test-cmd-bin -target wasm32-wasi
expected error.IsDir, found error.AccessDenied
expected error.IsDir, found error.AccessDenied
expected error.IsDir, found error.AccessDenied
expected error.PathAlreadyExists, found error.AccessDenied
expected error.IsDir, found error.AccessDenied
2062 passed; 142 skipped; 10 failed.
test failureError: failed to run main module `C:\Users\Ryan\Programming\Zig\zig\zig-cache\o\888da39961c5da85532636b351c586fb\test.wasm`
Caused by:
0: failed to invoke command default
1: wasm trap: wasm `unreachable` instruction executed
wasm backtrace:
0: 0x900f - os.abort
at lib/lib\std/os.zig:500:9
1: 0x7bd7 - builtin.default_panic
at lib/lib\std/builtin.zig:784:25
2: 0x7234 - test_runner.main
at lib/lib/test_runner.zig:13:30
3: 0x71b8 - start.callMain
at lib/lib\std/start.zig:568:22
- _start
at lib/lib\std/start.zig:241:42
error: the following test command failed with exit code 3:
wasmtime --dir=. --allow-unknown-exports C:\Users\Ryan\Programming\Zig\zig\zig-cache\o\888da39961c5da85532636b351c586fb\test.wasm
Most of the bugs in the test file are caused by testing for the removal of files that are actually a directory.
when wasmtime is called by path_unlink_file
it goes through wasm-common
unlink_file -> cap-std -> remove_file_or_symlink ->
https://github.com/bytecodealliance/cap-std/blob/2636733c047cda304bc53123dff9ca56a953510e/cap-fs-ext/src/dir_ext.rs
calls remove_file from cap-std https://docs.rs/cap-std/latest/src/cap_std/fs/dir.rs.html#346-348 , which then calls cap-primitives's remove_file , which calls remove_file_unchecked
[Note: I might be slightly off with the trace, but I believe the final function call is correct]
this ends up calling rust's fs::remove_file
which calls Kernel32!DeleteFile
according to rust's std.
Now, because DeleteFile
returns error 5 (Access/Permission Denied) and there is no IsDirectory in
Kernel32 (although there is an NTSTATUS error for NT functions), the error gets picked up by wasmtime as that.
One way to fix this is probably by calling NtDeleteFile
on wasmtime side instead of DeleteFile
.
Another option is to use other API functions (such as GetFileAttributes or fd_fdstat_get in WASI) to check the attribute of the file and return the appropriate error (See last 2 paragraph).
Since this is caused by an "intended" behavior from the Windows API, I am not sure whether this should be handled on Zig's side or left for the WASM runtime/engine.
Similarly, the open and create functions also acts the same way because they call CreateFile
.
The only exception for windows is rename, because rename in wasmtime uses rust's fs function, which
uses MoveFileEx
with the replace flag, which is the same as Zig's behavior for files. MoveFileEx
, however, will also fail the same way for directories (https://github.com/bytecodealliance/wasmtime/blob/1321c234e53b0f56ec9ab29d89bdde16af733130/crates/test-programs/wasi-tests/src/bin/path_rename.rs#L63)
(* also see: https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-movefileexa)
Another thing I found during debugging this issue is that there are some tests that are marked to not execute on windows. Since wasmtime will eventually call windows functions, they will eventually fail for the same reason.
I added a fix on my fork here and will comment for clarity, but I am not sure if I should submit a PR since the fix includes adding fd_fdstat_get
check on every call. This will cause POSIX to do unnecessary checks.
wasmtime is treating functions called in windows environment as windows functions rather than a POSIX functions. Of course, this depends on how the snapshot specs should be interpreted. The specs specify that these functions should be "similar" to those found in POSIX's but does not specify any further(?). Therefore, should they also inherit the same errors as POSIX for consistency or should they inherit errors from their runtime environment?
(side note for people who wants to debug and trace unreported errors: I added a skip on every test and when error went down by one, it meant that I was in the test that caused it, then added print statement in random places to see where it's exactly located. Not sure if there is a better way since try calls would fail with no proper trace)
Thanks for the detailed writeup! It seems like wasmtime and Zig are running into similar problems with regard to a single cross-platform API for filesystem stuff.
This is an unresolved problem with the Zig standard library (see https://github.com/ziglang/zig/issues/6364 for the rename issue you're talking about, and https://github.com/ziglang/zig/issues/6600 for a broader perspective), and it sounds like this might be the case with WASI/wasmtime as well.
The extra stat
calls on Zig's side seems unfortunate, since, as I understand it, ironing out these sorts of cross-platform differences would be something that a WASI runtime would have an interest in doing (basically for the same reasons as Zig's std library). If so, then in theory both wasmtime and Zig may eventually converge in terms of their cross-platform API (that is, the stuff wasmtime would need to do to have consistent cross-platform behavior would be similar/the same to what Zig would have to do).
Still unsure how to proceed, since I'm unclear on exactly what wasmtime (or the WASI spec) has to say about these sorts of differences or if there's a plan to address them in the future.