wasi-sdk icon indicating copy to clipboard operation
wasi-sdk copied to clipboard

Compiling libz-ng using wasi-sdk produces an invalid binary

Open bjorn3 opened this issue 1 year ago • 11 comments

The blogpost-uncompress example of zlib-rs includes the C libz-ng library. Without libz-ng, zlib-rs compiles fine for wasm.

Reproduction

$ git clone https://github.com/trifectatechfoundation/zlib-rs
$ cd zlib-rs
$ git apply <<EOF
diff --git a/test-libz-rs-sys/Cargo.toml b/test-libz-rs-sys/Cargo.toml
index 47d22c1..ee4ddf6 100644
--- a/test-libz-rs-sys/Cargo.toml
+++ b/test-libz-rs-sys/Cargo.toml
@@ -25,5 +25,5 @@ libz-sys.workspace = true
 quickcheck.workspace = true
 crc32fast = "1.3.2"
 
-libloading.workspace = true
-dynamic-libz-sys.workspace = true
+#libloading.workspace = true
+#dynamic-libz-sys.workspace = true
EOF
$ CC=/path/to/wasi-sdk-24.0-x86_64-linux/bin/clang cargo build -p test-libz-rs-sys --example blogpost-uncompress --release --target wasm32-wasip1
$ wasmtime compile target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm
Error: WebAssembly translation error

Caused by:
    Invalid input WebAssembly code at offset 108104: type mismatch: values remaining on stack at end of block
$ node
> const wasmBuffer = fs.readFileSync('target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm');
undefined
> let mod = await WebAssembly.compile(wasmBuffer)
Uncaught:
CompileError: WebAssembly.compile(): Compiling function #191:"adler32_stub" failed: expected 1 elements on the stack for fallthru, found 3 @+104593

bjorn3 avatar Oct 01 '24 15:10 bjorn3

Given that both Wasmtime and node consider this an invalid Wasm, I suspect that the toolchain really is producing an invalid Wasm, rather than that both validators have the same bug.

So I suspect this is ultimately a bug in either LLVM or wasm-ld.

fitzgen avatar Oct 01 '24 15:10 fitzgen

I found out a bit more with:

$ wasm-tools validate target/wasm32-wasip1/release/examples/blogpost-uncompress.wasm
error: /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-sys-1.1.20/src/zlib-ng/functable.c:300:5 function `adler32_stub` failed to validate

Caused by:
    0: func 191 failed to validate
    1: type mismatch: values remaining on stack at end of block (at offset 0x192ee)

which points to this location. Which ended up not being that illuminating.

Looking at the disassembly I see:

(;@191c2 ;)  (func $adler32_stub (;191;) (type 2) (param i32 i32 i32) (result i32)
(;@191c3 ;)    (local i32 i32 i32)
(;@191c5 ;)    global.get $__stack_pointer
(;@191cb ;)    i32.const 16
(;@191cd ;)    i32.sub
(;@191ce ;)    local.tee 3
(;@191d0 ;)    global.set $__stack_pointer
(;@191d6 ;)    global.get $GOT.data.internal.__memory_base
(;@191dc ;)    local.set 4
(;@191de ;)    local.get 3
(;@191e0 ;)    i32.const 15
(;@191e2 ;)    i32.add
(;@191e3 ;)    call $cpu_check_features
(;@191e9 ;)    local.get 4
(;@191eb ;)    i32.const 7453020
(;@191f1 ;)    i32.add
(;@191f2 ;)    local.tee 4
(;@191f4 ;)    global.get $GOT.func.internal.adler32_c
(;@191fa ;)    i32.store offset=4
(;@191fd ;)    local.get 4
(;@191ff ;)    global.get $GOT.data.internal.__table_base
(;@19205 ;)    local.tee 5
(;@19207 ;)    i32.const 40
(;@1920d ;)    i32.add
(;@1920e ;)    i32.store
(;@19211 ;)    local.get 4
;; ...
(;@192d5 ;)    local.get 1
(;@192d7 ;)    local.get 2
(;@192d9 ;)    call $_start
(;@192df ;)    local.set 4
(;@192e1 ;)    local.get 3
(;@192e3 ;)    i32.const 16
(;@192e5 ;)    i32.add
(;@192e6 ;)    global.set $__stack_pointer
(;@192ec ;)    local.get 4
             )

which also isn't too illuminating. I'm seeing GOT.* things though which looks like this is compiled with -fPIC which might be the cause of the issue. Dynamic linking, which I don't think is intended here, might need more careful setup to align things? (Rust isn't trying to do that by default and probably is missing a linker flag or something like that)

alexcrichton avatar Oct 01 '24 15:10 alexcrichton

Disabling -fPIC indeed fixes the issue. It seems that the cc crate by default passes it most targets, including wasm. Is there a reason mixing -fPIC and -fno-pic doesn't work? It works fine on native targets afaik.

bjorn3 avatar Oct 07 '24 15:10 bjorn3

I believe it's intended to work, but that's at least good for helping to narrow down the issue! I'm not familiar enough with -fPIC myself though to know how best to debug this further and how to reduce. Ideally there'd be just a standalone *.c file or two with clang flags to reproduce, and that'd probably be best for getting this fixed in LLVM

alexcrichton avatar Oct 07 '24 15:10 alexcrichton

yes, this sounds like some kind llvm/lld issue with linking -fPIC code into static binaries. It should indeed be supported (although with some binary side and runtime overheard and would require wasm-opt or similar to then remove).

Can you open an llvm bug for this? If possible with a simple reproducer attached.

sbc100 avatar Oct 07 '24 21:10 sbc100

Pushed a reproducer which barely uses any rust (aside from to build zlib-ng) to https://github.com/bjorn3/wasi-sdk-issue-492 Turns out that opt-level=1 is necessary for the bug to show up.

bjorn3 avatar Oct 08 '24 08:10 bjorn3

One reduction via creduce is

struct table {
  int (*f1)(char, int);
  int (*f2)(void);
} functable;

int g(char, int);

void h() {
  struct table t;
  t.f1 = g;
  __atomic_store(&functable.f1, &t.f1, 5);
  __atomic_store(&functable.f2, &t.f2, 5);
}
void i(char buf, int j) {
  h();
  functable.f1(buf, j);
}
$ clang -fPIC -O1 src.c -c -o foo.o -g
$ wasm-tools validate foo.o
error: /home/alex/code/wasi-sdk-issue-492/build/src.c:16:3 function `i` failed to validate

Caused by:
    0: func 2 failed to validate
    1: type mismatch: expected a type but nothing on stack (at offset 0xf1)

I'm not sure if that's the exact same issue as the one here, but it looks similar at least.

alexcrichton avatar Oct 08 '24 16:10 alexcrichton

Have you already opened an llvm issue or do you mind if I open one?

bjorn3 avatar Oct 09 '24 08:10 bjorn3

I haven't yet, please feel free!

alexcrichton avatar Oct 09 '24 14:10 alexcrichton

Opened https://github.com/llvm/llvm-project/issues/111855

bjorn3 avatar Oct 10 '24 15:10 bjorn3

The latest version of the cc crate has stopped passing -fPIC on wasm.

bjorn3 avatar Oct 14 '24 09:10 bjorn3