wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

MIPS32 support / alignment issues

Open InBetweenNames opened this issue 1 year ago • 11 comments

Hello WAMR team,

Thanks for maintaining such a cool project. I have a MIPS related problem I was hoping you could help me with. How well is MIPS32 supported in WAMR? I'm having a hard time getting a simple "Hello world" Rust wasip1 application running on mips32 (big endian). I'm noticing that on x86[-64], arm, and aarch64 everything seems to work fine, but on mips32, I'm getting an "undefined element" exception in both the Classic and Fast interpreter modes:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/eab409a4df9010bce661c800ef0ddf60acae3b4c/core/iwasm/interpreter/wasm_interp_fast.c#L1705

and similarly for the classic interpreter here:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/eab409a4df9010bce661c800ef0ddf60acae3b4c/core/iwasm/interpreter/wasm_interp_classic.c#L2340

What seems to be happening is the "val" variable in both cases is being dereferenced to the value 0x80000000 which is far outside of the table range in the Rust example. It seems this has to do with the handling of the call_indirect function, where it looks up from the wasm table where the call target will be.

I wonder, is this an endian related issue? I haven't debugged it further to see what val should be on a normal execution.

Now, Zig's "hello world" application seems to work just fine with target triple wasm32-wasi-musl. It seems to only feature one call_indirect.

In addition to this, mips64 seems to SIGBUS on a misaligned access when running the Rust hello world (although not directly relevant to my use case). It seems I can't attach the .wasm here directly, but it is simple enough to reproduce: Just cargo init and then build for Rust using cargo build --target wasm32-wasip1.

Thank you very much!

InBetweenNames avatar Aug 31 '24 19:08 InBetweenNames

Okay, I did a lot more digging into this today. It turns out that it's not just MIPS32 that's having problems. There are a number of pointer misalignment issues within WAMR that I believe are causing runtime issues. Two examples to note are that tables are not aligned correctly (easily fixable), and the CSP related stuff (involving frame->sp_boundary) on the WASM stack isn't aligned properly. That one seems a little less obvious to fix. There still may be some endianness issues at play here too but the pointer misalignment is definitely a higher priority to fix. I noticed that the CI for the repo largely uses 32-bit x86 and RISC-V. The problems will be much more apparent on platforms where sizeof(int) != sizeof(void*). On x86_64, when I built in Debug mode on Clang 18, it seems that CMake enabled some sanitizers which was enough to trigger the issue.

InBetweenNames avatar Sep 01 '24 21:09 InBetweenNames

Ref: #2349

InBetweenNames avatar Sep 01 '24 21:09 InBetweenNames

Possible ref: #2136

InBetweenNames avatar Sep 01 '24 21:09 InBetweenNames

Okay, I did a lot more digging into this today. It turns out that it's not just MIPS32 that's having problems. There are a number of pointer misalignment issues within WAMR that I believe are causing runtime issues. Two examples to note are that tables are not aligned correctly (easily fixable), and the CSP related stuff (involving frame->sp_boundary) on the WASM stack isn't aligned properly. That one seems a little less obvious to fix. There still may be some endianness issues at play here too but the pointer misalignment is definitely a higher priority to fix. I noticed that the CI for the repo largely uses 32-bit x86 and RISC-V. The problems will be much more apparent on platforms where sizeof(int) != sizeof(void*). On x86_64, when I built in Debug mode on Clang 18, it seems that CMake enabled some sanitizers which was enough to trigger the issue.

Hi, thanks for reporting the issue, I tried building iwasm under Ubuntu x86-64 with clang and enabled the sanitizer ubsan: cmake .. -DWAMR_BUILD_SANITIZER=ubsan, and then tested some standalone cases, but no sanitizer error was reported. Could you tell us how to reproduce the issue?

BTW, I have no mips environment to test the case, could you submit a PR to fix issues found? Thanks.

wenyongh avatar Sep 04 '24 03:09 wenyongh

For MIPS, a simple case is to just use the Zig toolchain to cross compile and then run under qemu-mips. This is actually sufficient to trigger the behaviour in question. You can use a build line like this:

CC="zig cc -target mips-linux-musl" cmake .. -DCMAKE_C_FLAGS="-g -Og -pie -fPIE" -DCMAKE_BUILD_TYPE=Debug -DWAMR_BUILD_SHARED=0 -DWAMR_BUILD_INTERP=1 -DWAMR_BUILD_FAST_INTERP=0 -DWAMR_BUILD_JIT=0 -DWAMR_BUILD_FAST_JIT=0 -DWAMR_BUILD_TARGET=MIPS

Simply run the resulting iwasm with qemu:

qemu-mips iwasm hello_wasi.wasm

But actually, I can reliably produce assertion failures on x86_64 with the following:

CC="zig cc -target x86_64-linux-musl" cmake .. -DCMAKE_C_FLAGS="-g -Og -pie -fPIE" -DCMAKE_BUILD_TYPE=Debug -DWAMR_BUILD_SHARED=0 -DWAMR_BUILD_INTERP=1 -DWAMR_BUILD_FAST_INTERP=0 -DWAMR_BUILD_JIT=0 -DWAMR_BUILD_FAST_JIT=0 -DWAMR_BUILD_TARGET=X86_64

This one is a little more salient as it seems to build with assertions for checking alignment of pointers. In particular, the CSP related code in the interpreter is suspect:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_interp_classic.c#L6640-L6643

Here, a value on the WASM stack is being treated as a raw pointer to a WASMBranchBlock which on X86_64 has an alignment requirement of 8. However, no such guarantee is provided by the stack. True, x86_64 will handle this just fine, but there's no guarantee the compiler will.

Another example is in how tables are allocated in the runtime:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_runtime.c#L2377-L2378

The definitions are as follows:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_runtime.h#L377C12-L379C38

So, global_data is only a uint8_t* , and global_data_size does not appear to have any guarantees about whether it is a multiple of align(WASMTableInstance) (which must be aligned to at least 8 on x86_64 due to the presence of pointers in the type). Therefore first_table may not be suitably aligned. This one is simpler to fix by just aligning it and ensuring the allocation has enough room for it.

Hope this helps!

On Tue, Sep 3, 2024 at 11:34 PM Wenyong Huang @.***> wrote:

Okay, I did a lot more digging into this today. It turns out that it's not just MIPS32 that's having problems. There are a number of pointer misalignment issues within WAMR that I believe are causing runtime issues. Two examples to note are that tables are not aligned correctly (easily fixable), and the CSP related stuff (involving frame->sp_boundary) on the WASM stack isn't aligned properly. That one seems a little less obvious to fix. There still may be some endianness issues at play here too but the pointer misalignment is definitely a higher priority to fix. I noticed that the CI for the repo largely uses 32-bit x86 and RISC-V. The problems will be much more apparent on platforms where sizeof(int) != sizeof(void*). On x86_64, when I built in Debug mode on Clang 18, it seems that CMake enabled some sanitizers which was enough to trigger the issue.

Hi, thanks for reporting the issue, I tried building iwasm under Ubuntu x86-64 with clang and enabled the sanitizer ubsan: cmake .. -DWAMR_BUILD_SANITIZER=ubsan, and then tested some standalone cases, but no sanitizer error was reported. Could you tell us how to reproduce the issue?

BTW, I have no mips environment to test the case, could you submit a PR to fix issues found? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/bytecodealliance/wasm-micro-runtime/issues/3761#issuecomment-2327845932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3VLDEZVGJEEJVUG2OKSDTZUZ5S3AVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRXHA2DKOJTGI . You are receiving this because you authored the thread.Message ID: @.***>

InBetweenNames avatar Sep 04 '24 19:09 InBetweenNames

Hi, I tried to fix the issue with below patch, could you test whether it works for you: fix_unaligned_access.zip

wenyongh avatar Sep 05 '24 06:09 wenyongh

Hi, thanks for your time and response. I read the diff and it does go some way towards solving the problem, but I have concerns about the

UINT64_MAX == UINTPTR_MAX && WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS`

logic. The problem is that C does not support misaligned pointers at all, even if the processor does. It would be simpler and more correct to have one codepath, where all pointers are suitably aligned for the given architecture (i.e. remove the WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS define entirely). In the case where things must be unaligned, or where alignment cannot be guaranteed, memcpy is useful to sidestep these restrictions. If the concern is performance, the optimizer will fortunately optimize all of it away into the code that you want anyways (you can use __builtin_memcpy if you want to be absolutely sure). For example, suppose a memcpy is used to copy a value into a variable with higher alignment requirements, i.e. a uint64_t from the WASM stack. On an architecture like x86_64, this will optimize into one mov instruction no matter what. On architectures that don't support unaligned loads, this will break down into two 32-bit moves which is what would be required for correctness anyway.

On Thu, Sep 5, 2024 at 2:00 AM Wenyong Huang @.***> wrote:

Hi, I tried to fix the issue with below patch, could you test whether it works for you: fix_unaligned_access.zip https://github.com/user-attachments/files/16884624/fix_unaligned_access.zip

— Reply to this email directly, view it on GitHub https://github.com/bytecodealliance/wasm-micro-runtime/issues/3761#issuecomment-2330667022, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3VLDHKUGBXTJ22NLR3GNDZU7XQVAVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZQGY3DOMBSGI . You are receiving this because you authored the thread.Message ID: @.***>

InBetweenNames avatar Sep 05 '24 14:09 InBetweenNames

The WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS is only enabled in x86-64, x86-32 and aarch64 by default, and it helps reduce the footprint in some situations, e.g., for generate less pre-compiled code in fast interpreter mode. Developer can also select whether to enable it or not, I don't think we had better remove it entirely. In the patch for mips32, you can remove it if you want.

wenyongh avatar Sep 06 '24 06:09 wenyongh

Unfortunately this is still not correct even on X86_64. For example, on -march=generic for x86_64, it would be legal for the compiler to emit movaps instructions if the type had an alignment requirement of 16 bytes or higher. This would produce a fault if this condition was not met. Can you provide some examples or ideally numbers on how much this reduces the memory footprint of WAMR? I believe that if memory is truly that constrained on x86-64, x86-32, or aarch64 that a memcpy would be sufficient to ensure correct operation, even if it would potentially be slower (recall that in the case where the pointer is correctly aligned, the memcpy would be elided). The problem is that the standard requires that all pointers be suitably aligned and this fact can be used by optimization passes, which will lead to subtle bugs. Fortunately, this is fixable and is not a fundamental part of the design of WAMR.

On Fri, Sep 6, 2024 at 2:44 AM Wenyong Huang @.***> wrote:

The WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS is only enabled in x86-64, x86-32 and aarch64 by default, and it helps reduce the footprint in some situations, e.g., for generate less pre-compiled code in fast interpreter mode. Developer can also select whether to enable it or not, I don't think we had better remove it entirely. In the patch for mips32, you can remove it if you want.

— Reply to this email directly, view it on GitHub https://github.com/bytecodealliance/wasm-micro-runtime/issues/3761#issuecomment-2333358460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3VLDARWNSLPDAPVRLBY63ZVFFOLAVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZTGM2TQNBWGA . You are receiving this because you authored the thread.Message ID: @.***>

InBetweenNames avatar Sep 06 '24 17:09 InBetweenNames

I did a test for the coremark standalone case, disabling the macro while keeping storing the label address in the pre-compiled code (that means the label address will be stored at 8-byte align address and there may be unused paddings before it) will increase the pre-compiled size from 115925 bytes to 134596 bytes, and interpreter has to align to 8-byte aligned address before get the label address, which may impact the performance. Since it increases the code size a lot, currently in 64-bit when the macro is disabled, we store the offset of two labels instead (dest label - base label) and a add operation is added to calculate the label address after reading the offset, which also impacts the performance but gets better footprint.

The wasm runtime itself doesn't uses movaps, we didn't find that the gcc/clang generates movaps instruction after compiling wamr runtime. And in the generated AOT/JIT code, the movapu is emitted instead if the address may be not 16-byte aligned.

The point is that the hardware supports the unaligned access operations and we can control it and it benefits to the developers, is it better to keep the capability? And runtime has provided the option to disable it.

wenyongh avatar Sep 09 '24 07:09 wenyongh

Hi Wenyong,

Thanks for your response, indeed a 16% change in footprint is higher than I expected.

The point is that the hardware supports the unaligned access operations and we can control it and it benefits the developers, why must we limit the capability? And runtime has provided the option to disable it.

I suppose my main objection is that you can't really control it because the compiler is free to interpret the code how it wants, even if you observe it working now there is no guarantee it will work in all cases and on all toolchains in the future, and the results can be surprising when the code doesn't generate how you expect. Personally, I think the risks outweigh the benefits (since there are now two code paths to support and UB is on by default for some platforms) but I'll leave it to you to decide what's best for your project. It is unfortunate that there's no way to signal to clang or gcc to assume that a struct is not aligned other than by using the packed attribute, which has its own problems here (can't take the address of a member).

On Mon, Sep 9, 2024 at 3:59 AM Wenyong Huang @.***> wrote:

I did a test for the coremark standalone case, disabling the macro while keeping storing the label address in the pre-compiled code (that means the label address will be stored at 8-byte align address and there may be unused paddings before it) will increase the pre-compiled size from 115925 bytes to 134596 bytes, and interpreter has to align to 8-byte aligned address before get the label address, which may impact the performance. Since it increases the code size a lot, currently in 64-bit when the macro is disabled, we store the offset of two labels instead (dest label - base label) and a add operation is added to calculate the label address after reading the offset, which also impacts the performance but gets better footprint.

The wasm runtime itself doesn't uses movaps, we didn't find that the gcc/clang generates movaps instruction after compiling wamr runtime. And in the generated AOT/JIT code, the movapu is emitted instead if the address may be not 16-byte aligned.

The point is that the hardware supports the unaligned access operations and we can control it and it benefits the developers, why must we limit the capability? And runtime has provided the option to disable it.

— Reply to this email directly, view it on GitHub https://github.com/bytecodealliance/wasm-micro-runtime/issues/3761#issuecomment-2337398621, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3VLDAQLL3L3AF5UTSBO7LZVVIO5AVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZXGM4TQNRSGE . You are receiving this because you authored the thread.Message ID: @.***>

InBetweenNames avatar Sep 09 '24 15:09 InBetweenNames

I've noticed that wasm that perfectly works in the browser as well as wamr @ 72872cbd44d7c2448ea4ed52ac7a168dbd7edf32 is now failing in the main branch with the "undefined element" on target linux x86_32

Not sure if helps with this issue but maybe others can try that commit and confirm the introduction of this bug.

greenaddress avatar Dec 05 '24 18:12 greenaddress