emscripten [DRAFT] add wasm-bindgen support

This is an early draft PR for the purposes of gathering feedback early. There are also pending changes to wasm-bindgen.

How this works:

Cargo builds Rust code targeting wasm32-unknown-emscripten into a .a file.
Emscripten is invoked with any C++ sources and the just built Rust .a file.
Emscripten builds C++ sources and then calls out to wasm-ld to link the C++ and Rust into a .wasm file.
wasm-bindgen is run on that .wasm file, producing a new .wasm file, a library.js file, and a pre.js file.
Emscripten constructs its own .js, integrating the wasm-bindgen .js files.

You can see a demo more easily at https://github.com/walkingeyerobot/cxx-rust-demo. library_wbg.js and pre.js are approximately what will be produced by wasm-bindgen for consumption by Emscripten.

Some TODOs:

Figure out how to pass the exported symbols from the rust compiler to Emscripten. These are symbols that need to be passed to wasm-ld so they're not removed in the final .wasm but that may not necessarily be present after wasm-bindgen processes the .wasm. wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions. These functions are then removed after JS generation.
Merge the .js files produced by wasm-bindgen. This shouldn't be that hard; I just haven't gotten around to it yet. This would simplify the code for both Emscripten and wasm-bindgen.
Get wasm-bindgen tests to pass. Early efforts here have revealed some very odd compiler differences between -unknown and -emscripten that I'll have to fix.
Have this work end-to-end via wasm-pack. I'll have a draft PR for this soon (tm).

I'm mostly looking for feedback on the first point about exported symbols and about the general addition of -sWASM_BINDGEN to Emscripten. Again, this is very early, but it's a pretty big feature, so I thought it best to start discussions now.

cc @daxpedda, who I've been working with on the wasm-bindgen side.

Jan 24 '25 20:01 walkingeyerobot

wasm-bindgen at compile time puts the information it needs to generate JS inside the .wasm file itself in the form of _describe functions.

Does rustc then read the wasm to find those function names, and pass those names to wasm-ld? (if not, how does it find those names?)

In general if we need to read metadata-type info from the wasm, then we have a minimal parser in tools/webassembly.py. If we need something more complex, a binaryen pass is an option.

Jan 24 '25 22:01 kripken

wasm-bindgen itself is two pieces: a library that allows you to annotate your rust code marking things to be exported, and a tool that consumes a .wasm file and reads those annotations to produce a companion js file. rustc knows about those function names because wasm-bindgen as a library provided the annotations. If rustc invokes the linker itself, it's able to pass that information along. However, because we need to also build C++, we're only using rustc to compile and not drive the whole process, so we need to have it output that information elsewhere.

One (very naive) possibility is to have rustc invoke a fake linker that just writes the -sEXPORTED_FUNCTIONS to a file for emscripten to read later.

Jan 24 '25 22:01 walkingeyerobot