binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

wasm2js aborts when allocating pages.

Open RmStorm opened this issue 1 year ago • 12 comments

I'm running into some pretty serious issues using wasm2js. It looks to me like the entire wasm memory is getting corrupted. Sometimes they result in clean-ish aborts like this:

➜ node index.js 
allocating pages: 4
allocating pages: 4
allocating pages: 7
first done
allocating pages: 13
Error calling wasm function: Error: abort
    at wasm2js_trap (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/pkg/wasm2js_memory_problem_bg.wasm.js:25:33)
    at __rust_start_panic (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/pkg/wasm2js_memory_problem_bg.wasm.js:4319:3)
    at rust_panic (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/pkg/wasm2js_memory_problem_bg.wasm.js:4218:3)
    at std__panicking__rust_panic_with_hook__he5c089ac7305193e

But sometimes they result in just complete corruption and I get output like: I%@�<#�f�[email protected]�n~I%@��r�f�M@q��$~I%@�><�f�M@�a�}

I've made a repo with a minimal reproducible example. The wasm file that I'm using is generated using wasm-bindgen. I have tested with versions 0.2.90, 0.2.91 and 0.2.92 I have also tested with wasm2js version 105 (the one I originally had installed) and the latest version (119). It occurs in all situations!

RmStorm avatar Nov 28 '24 12:11 RmStorm

It looks similar to: https://github.com/WebAssembly/binaryen/issues/6628 but since he reports downgrading resolved his issue I suspect mine is distinct.

RmStorm avatar Nov 28 '24 12:11 RmStorm

The minimal testcase here looks like it should have worked in the past - I mean that it is so minimal, that given things used to work in general, it must have.

Given that, what I would do is bisect. You should be able to bisect separately over wasm-bindgen and wasm-opt, and only one should be needed (but you might need to try both if the first finds nothing). I see you mentioned you tried wasm-opt 105, which is from 2 years ago, so I would go even further back, until you find where it used to work. Both going back and bisecting should take logarithmic time, so it should be practical.

kripken avatar Nov 28 '24 18:11 kripken

@mtb0x1 ~~Thanks! That indeed seems to fix it! Do you know why this might be the case?~~

Edit: no it didn't actual fix the issue I think. It just avoids the trap sometimes, sometimes it seem to run.. sometimes it crashes.. I changed index.js to:

async function run() {
  try {
    console.log(greet("no dice"))
    for (let i = 0; i < 10; i++) {
      const inputStr = "A".repeat(i+1).repeat(20000)
      console.log("iteration: ", i, "input length: ", inputStr.length)
      console.log("output length", greet(inputStr).length)
    }
  } catch (err) {
    console.error("Error calling wasm function:", err);
  }
}

And then the output is: ( I removed the page allocation log lines)

❯ node index.js 
Hello, no dice!
iteration:  0 input length:  10000
output length 10008
iteration:  1 input length:  20000
output length 10008
iteration:  2 input length:  30000
output length 10008
iteration:  3 input length:  40000
output length 10008
iteration:  4 input length:  50000
output length 10008
iteration:  5 input length:  60000
output length 10008
iteration:  6 input length:  70000
output length 10008
iteration:  7 input length:  80000
output length 10008
iteration:  8 input length:  90000
output length 10008
iteration:  9 input length:  100000
output length 10008

It's very clear that the output doesn't actually change so something is very wrong. If I make the input string twice as long like so const inputStr = "A".repeat(i+1).repeat(20000) I get:

❯ node index.js 
Hello, no dice!
iteration:  0 input length:  20000
Error calling wasm function: TypeError: The encoded data was not valid for encoding utf-8
    at TextDecoder.decode (node:internal/encoding:443:16)
    at getStringFromWasm0 (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/pkg/wasm2js_memory_problem_bg.js:90:30)
    at greet (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/pkg/wasm2js_memory_problem_bg.js:108:16)
    at run (file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/index.js:12:36)
    at file:///home/rmstorm/Documents/rust/wasm2js-memory-problem/index.js:19:1
    at ModuleJob.run (node:internal/modules/esm/module_job:268:25)
    at async onImport.tracePromise.__proto__ (node:internal/modules/esm/loader:543:26)
    at async asyncRunEntryPointWithESMLoader (node:internal/modules/run_main:116:5) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Which is just nonsense indicating memory corruption I think..

RmStorm avatar Nov 29 '24 08:11 RmStorm

If bisection doesn't work, another option is to debug this in depth using Binaryen's instrumentation, comparing wasm and wasm2js builds. Specifically, we can instrument the builds so that every single read and write to memory, locals, etc. are logged out. Assuming everything is deterministic, and that the wasm build works while wasm2js errors, then comparing the logs will find the first divergence, and pinpoint the bug.

To do this, the process is something like

  • Instrument the wasm using wasm-opt input.wasm -o output.wasm --instrument-locals --log-execution --instrument-memory (you may not need all three).
  • Add the new imports to the JS loading code, which you can get from here
  • Run the wasm, save the log.
  • Run wasm2js, then run that build and save the log.
  • Diff the logs.

kripken avatar Dec 04 '24 17:12 kripken

Since my issue was mentioned before and we again tried to bump wasm_bindgen, I decided to jump in this one and say that we also use large strings that we pass between Rust and JS frontend which seems to be the cause of the issue. When we replace some of these large values in rust with empty data we do get to some point in the applicaiton but still get errors like abort and Failed to execute 'decode' on 'TextDecoder'

Sadly, as of right now, due to capacity and time constraints we are not able to allocate resources to debug this. We also don't have that much experience with wasm or wasm2js to actually preform the analysis and isolate the problem.

wee_alloc doens't work for us and results in another issue.

Image

elpiel avatar Feb 24 '25 15:02 elpiel

I decided to follow @kripken instructions above to try and explore the issue in the generated file but now with wasm-bindgen 0.2.100 I get a different error in the provided reproducible example:

$ make all
wasm-bindgen target/wasm32-unknown-unknown/release/wasm2js_memory_problem.wasm --target bundler --out-dir pkg
wasm2js pkg/wasm2js_memory_problem_bg.wasm -o pkg/wasm2js_memory_problem_bg.wasm.js
Fatal: error: modules with multiple tables are not supported yet.
make: *** [Makefile:28: wasm2js] Error 1

Also seems that this error happens after version 0.2.93 as I tested all the other version up to 100 with the appropriate wasm-bindgen-cli corresponding to the wasm-bindgen version

elpiel avatar Jun 02 '25 11:06 elpiel

@elpiel

Fatal: error: modules with multiple tables are not supported yet.

Does wasm-bindgen emit multiple tables? If so hopefully there is a way to disable that.

kripken avatar Jun 02 '25 15:06 kripken

Apparently this is due to update of LLVM in 1.82 and afaik you can use a rust flags to disable features on compilation and remove the reference types. E.g:

RUSTFLAGS="-C target-cpu=mvp" cargo +nightly build -Z build-std=std,panic_abort --target wasm32-unknown-unknown --release

I also saw a solution with stripping symbols instead of using nightly but I haven't tested it: RUSTFLAGS="-C strip=symbols"

https://github.com/rustwasm/wasm-bindgen/issues/4211 https://github.com/rust-lang/rust/issues/128475

NB: I'm more concerned about this memory issue, as even more libraries and issues are popping up due to using old libraries (wasm-bindgen) and older rust version.

elpiel avatar Jun 02 '25 15:06 elpiel

Hello again everyone! After spending way too much time on this.... I managed to get some results with trial and error.

The problem

What ends up happening is that this is a fundamental incompatibility between wasm-bindgen's JS glue, which is optimized for native WebAssembly.Memory behavior (where the old buffer is "detached" in a detectable way), and the polyfill approach of wasm2js, which simply swaps out internal variable references. __wasm_memory_grow is internally correct from what I found for the code provided by @RmStorm but it does not provide a mechanism for external JS modules to detect the change, causing the wasm-bindgen glue to operate on a stale memory view.

There are certain moments in time (lengths of strings) which cause the results in the code to:

  1. have a some greet().length but no actual String inside (or rather just an opening bracket ( in this case due to corruption I suppose)
  2. have length == 0 I believe this is due to the stale memory view.

when I see 1 allocation of page (check the patch in https://github.com/RmStorm/wasm2js-memory-problem/blob/b9a83598abe020a540599b74d35166b5adb3fa48/Makefile#L34) code works fine.

The moment a second page is allocated:

allocating pages: 1
allocating pages: 1
output string   string // console.log(`output string `, result.substring(0, 15), typeof(result))
output length 21830 // console.log("output length", result.length)

I get a bad string although it has some length to it

Seeing a third allocation of pages, e.g.:

allocating pages: 2
allocating pages: 2
allocating pages: 3
output length 0

results in output length of 0.

What I should mention as well is that I've ran this code not in a loop but rather concrete values of the String as the loop introduces more unknown issues (due to the stale memory view)

import { greet } from "./pkg/wasm2js_memory_problem.js";

async function run() {
    try {
   // Works just fine...
   // const inputStr = "A".repeat(21820)
   // Has length but no actual string
    const inputStr = "A".repeat(21821)
    console.log(`input length: `, inputStr.length)
    const result = greet(inputStr)
    console.log(`output string `, result.substring(0, 15), typeof(result))
    console.log("output length", result.length)
  } catch (err) {
    console.error("Error calling wasm function:", err);
  }
}

run().catch((err) => console.error("Error initializing wasm module:", err));

A proposed solution

After querying one AI for ideas how to fix this, it suggested to have a hook reinit or similar function that needs to be exported and called inside the wasm-bindgen glue code. This, however, requires patching the wasm-bingen generated glue code and updating every memory function (not only for strings).

Sooooo... given the complexity of this issue I'd love if some of you might take a look and consider a proper fix. I'm experienced in Rust and dealing with the C code will be almost impossible especially due to the complexity and required API change for this to work, without the knowledge of the codebase.

Why is this important?

Well... Our team works on older Tvs which support only asmjs, hence, we cannot use wasm and this project is the only known way to support them using our Rust core without rewriting the world.

PS

Notable changes I did on the code provided above was:

  1. Update the Makefile and turn off the reference-type RUSTFLAGS="-C target-cpu=mvp" cargo +nightly build -Z build-std=std,panic_abort --target $(TARGET) --release (using nightly from 19.8)
  2. updated wasm-bindgen = "=0.2.100"

elpiel avatar Aug 20 '25 15:08 elpiel

IIUC, the relevant wasm2js output code is

 var buffer = new ArrayBuffer(65536);

 function __wasm_memory_grow(pagesToAdd) {
   [..]
   var newBuffer = new ArrayBuffer(Math_imul(newPages, 65536));
   var newHEAP8 = new Int8Array(newBuffer);
   newHEAP8.set(HEAP8);
   HEAP8 = new Int8Array(newBuffer);
   [..]
   buffer = newBuffer;
   [..]
 }

 return {
  "memory": Object.create(Object.prototype, {
   "grow": {
    "value": __wasm_memory_grow
   }, 
   "buffer": {
    "get": function () {
     return buffer;
    }

So when memory.grow is called, we call __wasm_memory_grow and that updates our internals. It also updates buffer, and the outside sees that when it reads .buffer of the fake polyfill wasm Memory we emit.

It sounds like wasm-bindgen expects that buffer to be detached after a growth - is it that, or the typed arrays on that ArrayBuffer? Either way, it sounds like what we need to do is make our fake polyfill wasm Memory behave more like the real thing, that is, to update

  "memory": Object.create(Object.prototype, {
   "grow": {
    "value": __wasm_memory_grow
   }, 
   "buffer": {
    "get": function () {
     return buffer;
    }

somehow.

kripken avatar Aug 20 '25 18:08 kripken

Ok, cool, let's do it!

I remember there were some bug fixes on wasm-bingen side regarding strings and/or memory in one of the next version that started having this problem, however, updating Rust is also a no go as far as I remember.

We spent way too much time and a few tries hoping the issue was on our end 😬 when updating wasm-bingen and Rust version.

How can I assist with the approach and subsequent fix?

elpiel avatar Aug 20 '25 20:08 elpiel

I won't have time to work on this myself, I don't expect, but in general

  • Find out exactly how wasm-bindgen code is using the exported Memory, and how the current behavior differs.
  • Fix that observable behavior on the polyfill Memory

kripken avatar Aug 20 '25 20:08 kripken