wasm-micro-runtime icon indicating copy to clipboard operation
wasm-micro-runtime copied to clipboard

The f32.max instruction causes a performance degradation in WAMR Fast JIT mode

Open gaaraw opened this issue 3 months ago • 2 comments

Subject of the issue

Hello, I noticed that the WAMR Fast JIT mode has poor performance when executing the instruction f32.max.

The specific time data is as follows:

time
wasmer_llvm 2.465432
wasmedge_jit 0.023468
wamr_llvm_jit 0.014776
wasmer_cranelift 1.243364
wasmtime 1.21076
wamr_fast_jit 9.855768

The data is in seconds, and each data is the result of ten executions and averages.

Test case

test_case.wat
(module
  (type (;0;) (func (param i32)))
  (type (;1;) (func))
  (import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
  (func (;1;) (type 1)
    (local i32)
    (local.set 0
      (i32.const 0))
    (loop

      (drop
        (f32.max
          (f32.const 0x1.ef5ecep+125)
          (f32.const 0x1.ef5ecep+125)))
          
      (local.set 0
        (i32.add
          (local.get 0)
          (i32.const 1)))
      (br_if 0
        (i32.ne
          (local.get 0)
          (i32.const 0))))
    (call 0
      (i32.const 0))
    (unreachable))
  (export "_start" (func 1))
  (memory (;0;) 1)
  (export "memory" (memory 0)))

Your environment

The runtime tools are all built on release and use JIT mode.

  • WAMR: iwasm 2.4.0
  • wasmer: 6.0.1
  • wasmtime: 36.0.0 (ada802c68 2025-08-20)
  • wasmedge: 0.15.0
  • wabt: 1.0.27
  • llvm: 18.1.8
  • Host OS: Ubuntu 22.04.5 LTS x64
  • CPU: 12th Gen Intel® Core™ i7-12700 × 20

Steps to reproduce

wat2wasm test_case.wat -o test_case.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm --llvm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmtime test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_fast_jit/iwasm test_case.wasm

Actual behavior

In the test case, I placed the f32.max instruction inside a loop to amplify the performance difference. Perhaps the Fast JIT mode has differences in the implementation or optimization of the f32.max instruction compared to other runtime tools, leading to poor performance.

Extra Info

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

gaaraw avatar Sep 17 '25 08:09 gaaraw

This is expected, fast jit called "fast" in a way that jit code generation is fast, not the jitted code optimized to be run fast. This is the purpose of the asmjit framework underneath. AsmJit is specifically designed for low-latency machine code generation rather than for extensive high-level optimizations

TianlongLiang avatar Sep 18 '25 00:09 TianlongLiang

I get it, thank you for your patient explanation.

gaaraw avatar Sep 18 '25 01:09 gaaraw