wasi-sdk
wasi-sdk copied to clipboard
Compiling libz-ng using wasi-sdk produces an invalid binary
The blogpost-uncompress example of zlib-rs includes the C libz-ng library. Without libz-ng, zlib-rs compiles fine for wasm.
Reproduction
$ git clone https://github.com/trifectatechfoundation/zlib-rs
$ cd zlib-rs
$ git apply <<EOF
diff --git a/test-libz-rs-sys/Cargo.toml b/test-libz-rs-sys/Cargo.toml
index 47d22c1..ee4ddf6 100644
--- a/test-libz-rs-sys/Cargo.toml
+++ b/test-libz-rs-sys/Cargo.toml
@@ -25,5 +25,5 @@ libz-sys.workspace = true
quickcheck.workspace = true
crc32fast = "1.3.2"
-libloading.workspace = true
-dynamic-libz-sys.workspace = true
+#libloading.workspace = true
+#dynamic-libz-sys.workspace = true
EOF
$ CC=/path/to/wasi-sdk-24.0-x86_64-linux/bin/clang cargo build -p test-libz-rs-sys --example blogpost-uncompress --release --target wasm32-wasip1
$ wasmtime compile target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm
Error: WebAssembly translation error
Caused by:
Invalid input WebAssembly code at offset 108104: type mismatch: values remaining on stack at end of block
$ node
> const wasmBuffer = fs.readFileSync('target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm');
undefined
> let mod = await WebAssembly.compile(wasmBuffer)
Uncaught:
CompileError: WebAssembly.compile(): Compiling function #191:"adler32_stub" failed: expected 1 elements on the stack for fallthru, found 3 @+104593
Given that both Wasmtime and node consider this an invalid Wasm, I suspect that the toolchain really is producing an invalid Wasm, rather than that both validators have the same bug.
So I suspect this is ultimately a bug in either LLVM or wasm-ld.
I found out a bit more with:
$ wasm-tools validate target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm
error: /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-sys-1.1.20/src/zlib-ng/functable.c:300:5 function `adler32_stub` failed to validate
Caused by:
0: func 191 failed to validate
1: type mismatch: values remaining on stack at end of block (at offset 0x192ee)
which points to this location. Which ended up not being that illuminating.
Looking at the disassembly I see:
(;@191c2 ;) (func $adler32_stub (;191;) (type 2) (param i32 i32 i32) (result i32)
(;@191c3 ;) (local i32 i32 i32)
(;@191c5 ;) global.get $__stack_pointer
(;@191cb ;) i32.const 16
(;@191cd ;) i32.sub
(;@191ce ;) local.tee 3
(;@191d0 ;) global.set $__stack_pointer
(;@191d6 ;) global.get $GOT.data.internal.__memory_base
(;@191dc ;) local.set 4
(;@191de ;) local.get 3
(;@191e0 ;) i32.const 15
(;@191e2 ;) i32.add
(;@191e3 ;) call $cpu_check_features
(;@191e9 ;) local.get 4
(;@191eb ;) i32.const 7453020
(;@191f1 ;) i32.add
(;@191f2 ;) local.tee 4
(;@191f4 ;) global.get $GOT.func.internal.adler32_c
(;@191fa ;) i32.store offset=4
(;@191fd ;) local.get 4
(;@191ff ;) global.get $GOT.data.internal.__table_base
(;@19205 ;) local.tee 5
(;@19207 ;) i32.const 40
(;@1920d ;) i32.add
(;@1920e ;) i32.store
(;@19211 ;) local.get 4
;; ...
(;@192d5 ;) local.get 1
(;@192d7 ;) local.get 2
(;@192d9 ;) call $_start
(;@192df ;) local.set 4
(;@192e1 ;) local.get 3
(;@192e3 ;) i32.const 16
(;@192e5 ;) i32.add
(;@192e6 ;) global.set $__stack_pointer
(;@192ec ;) local.get 4
)
which also isn't too illuminating. I'm seeing GOT.* things though which looks like this is compiled with -fPIC which might be the cause of the issue. Dynamic linking, which I don't think is intended here, might need more careful setup to align things? (Rust isn't trying to do that by default and probably is missing a linker flag or something like that)
Disabling -fPIC indeed fixes the issue. It seems that the cc crate by default passes it most targets, including wasm. Is there a reason mixing -fPIC and -fno-pic doesn't work? It works fine on native targets afaik.
I believe it's intended to work, but that's at least good for helping to narrow down the issue! I'm not familiar enough with -fPIC myself though to know how best to debug this further and how to reduce. Ideally there'd be just a standalone *.c file or two with clang flags to reproduce, and that'd probably be best for getting this fixed in LLVM
yes, this sounds like some kind llvm/lld issue with linking -fPIC code into static binaries. It should indeed be supported (although with some binary side and runtime overheard and would require wasm-opt or similar to then remove).
Can you open an llvm bug for this? If possible with a simple reproducer attached.
Pushed a reproducer which barely uses any rust (aside from to build zlib-ng) to https://github.com/bjorn3/wasi-sdk-issue-492 Turns out that opt-level=1 is necessary for the bug to show up.
One reduction via creduce is
struct table {
int (*f1)(char, int);
int (*f2)(void);
} functable;
int g(char, int);
void h() {
struct table t;
t.f1 = g;
__atomic_store(&functable.f1, &t.f1, 5);
__atomic_store(&functable.f2, &t.f2, 5);
}
void i(char buf, int j) {
h();
functable.f1(buf, j);
}
$ clang -fPIC -O1 src.c -c -o foo.o -g
$ wasm-tools validate foo.o
error: /home/alex/code/wasi-sdk-issue-492/build/src.c:16:3 function `i` failed to validate
Caused by:
0: func 2 failed to validate
1: type mismatch: expected a type but nothing on stack (at offset 0xf1)
I'm not sure if that's the exact same issue as the one here, but it looks similar at least.
Have you already opened an llvm issue or do you mind if I open one?
I haven't yet, please feel free!
Opened https://github.com/llvm/llvm-project/issues/111855
The latest version of the cc crate has stopped passing -fPIC on wasm.