managed pre-compiled wasm-bindgen is slower than locally-compiled
Problem
This is a spin-off from #4011 ("dx serve stalls").
I've researched this some more, and have some surprising findings.
First, I've let dx serve run to completion, to make sure the project is fully built, and the cache is warmed up.
Then, I've pressed r to rebuild the project. This is that result:
14:37:20 [dev] Full rebuild: triggered manually
14:37:34 [dev] Build completed in 2289ms
╭────────────────────────────────────────────────────────────────────────────────────────── /:more ╮
│ App: ━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎉 1.3s Platform: Web │
│ Bundle: ━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎉 12.3s App features: ["web"] │
│ Status: Serving bifrost-frontend 🚀 13.6s Serving at: http://127.0.0.1:8123 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
So it takes 12.3s to bundle, but only 1.3s to recompile! This seems excessive.
After some work with strace, I found the culprit - it's wasm-bindgen:
/home/chrivers/.local/share/dioxus/wasm-bindgen/wasm-bindgen-0.2.100
To compare, I cloned wasm-bindgen, and did a local compile in release mode.
After placing the resulting binary on top of the one dx calls, this is the result:
14:38:33 [dev] Full rebuild: triggered manually
14:38:38 [dev] Build completed in 1934ms
╭────────────────────────────────────────────────────────────────────────────────────────── /:more ╮
│ App: ━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎉 1.0s Platform: Web │
│ Bundle: ━━━━━━━━━━━━━━━━━━━━━━━━━━ 🎉 2.4s App features: ["web"] │
│ Status: Serving bifrost-frontend 🚀 3.3s Serving at: http://127.0.0.1:8123 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Obviously, this is much, much better.
Let's take a look at individual timing.
Stock:
⎋ Timing: 557% cpu --[ 3,185s total: 3,516s user + 14,248s krnl ]--
⩧ Memory: 935 MB max --[ 0 majpf + 414113 minpf ]--
⦺ Events: yield 91441 + forced 1712 // input 0 + output 80224
Locally compiled:
⎋ Timing: 280% cpu --[ 1,716s total: 1,840s user + 2,975s krnl ]--
⩧ Memory: 934 MB max --[ 0 majpf + 197682 minpf ]--
⦺ Events: yield 98929 + forced 374 // input 0 + output 80224
The stock version is spending most of the time in kernel code. That seems weird.
Related to the strace work, I stumbled on something strange. The version of wasm-bindgen that comes with Dioxus performs an insane number of allocations with mmap():
strace -f -tt -e trace=file,execve,desc -s 1024 \
/home/chrivers/.local/share/dioxus/wasm-bindgen/wasm-bindgen-0.2.100 \
--target web --keep-debug \
--out-name bifrost-frontend \
--no-typescript \
--out-dir ${PROJECT_TARGET}/dx/bifrost-frontend/debug/web/public/wasm-bindgen \
${PROJECT_TARGET}/wasm32-unknown-unknown/wasm-dev/bifrost-frontend.wasm \
|& grep mmap | wc -l
91054
That's 91054 system calls to mmap()! When looking through the output from strace, almost all of these are 4K or 8K allocations. It seems something is wrong with the memory allocator.
Here's the locally compiled version:
strace -f -tt -e trace=file,execve,desc -s 1024 \
wasm-bindgen/target/release/wasm-bindgen \
--target web --keep-debug \
--out-name bifrost-frontend \
--no-typescript \
--out-dir ${PROJECT_TARGET}/dx/bifrost-frontend/debug/web/public/wasm-bindgen \
${PROJECT_TARGET}/wasm32-unknown-unknown/wasm-dev/bifrost-frontend.wasm \
|& grep mmap | wc -l
183
Just 183 calls to mmap(), which is fine.
Expected behavior
I expect the wasm-bindgen version that comes with dioxus to be approximately as fast as the one that I can compile myself 😄
Environment:
- Dioxus version:
dioxus = { version = "0.6.0", features = [] } - Rust version:
rustc 1.85.0 (4d91de4e4 2025-02-17) - OS info:
Debian Stable (12.10) - App platform:
web
By the way, another weird thing:
The time reported in xx:xx:xx [dev] Build completed in <number>ms seems to be completely inconsistent with the other reported times, as well as actual real-world time. I'm not sure where that number's coming from, but it's not really helping much, imho.
We download wasm-bindgen from their github release page. We could try creating more optimized variants and distributing it through our own page.
It's weird how big of a difference it is. Maybe your CPU is pretty modern and has lots of room to take advantage of better codegen?
https://github.com/DioxusLabs/dioxus/blob/76ffcabd0be84ace2c131f625da420adb8c98441/packages/cli/src/wasm_bindgen.rs#L430-L449