Wasm/ESM integration and additional start/main functions
When starting up a Wasm module or application, it's often not enough to call the Wasm start function. Often, there's some kind of external code driving it, maybe generated by the toolchain or otherwise present in the host environment.
One idea from https://github.com/WebAssembly/design/issues/1160 to reduce the need for this sort of "driver" code was to make Wasm/ESM integration call a secondary "_start" function, after the exports are initialized, to permit the driver module to have the Wasm exports before functions that it exports to Wasm are run. However, there are some cases which don't quite meet this model:
- Before calling the WASI
_startfunction, the exported memory could be passed into WASI https://github.com/nodejs/node/pull/27850/files#diff-ab9c666b48467c7030bd93aac6d06eb2R184 - For command-style programs, we may want to have a difference between running a command and using it https://github.com/WebAssembly/esm-integration/issues/30
I suspect, if we examine more systems, we'll find more kinds of mismatches or subtle cases where more behavior is needed.
The goal for this issue is to compile patterns from various Wasm toolchains and environments, and see what would make sense to standardize. This standard initialization behavior could be used both in JS and outside of JS environments, in common between the two.
I'd like to see if we can coalesce on the design of a new custom section to declaratively specify this initialization behavior. I think it will make sense to do on top of the basic Wasm module semantics, with the start function being invoked atomically with module initialization before exports are made available, as explained by @rossberg.
I think the MVP Wasm/ESM integration semantics make sense as is, and any further behavior will be a v2 that layers on top. If you disagree, I'd love to understand better why in this thread, so we can make it clear to potential implementers of Wasm/ESM integration whether this is stable.
cc @lukewagner @guybedford @xtuc
I find the ESM integration somewhere in the territory of unusable because it isn't possible for a function called from wasm to get access to that wasm's memory (and it isn't possible to use stack trace inspection or anything like that to synchronously import the calling module and grab its memory export).
IIRC the webidl proposal suggests allowing slices of memory to be passed around as first class values, and that would solve this problem, but at the moment there's not really anything a user of the ESM integration can do.
Edit: fwiw, this is my only issue with the integration, everything else about it is :+1: from my perspective.
Forgive my naivety. But why can the memory not pass as anyref? Then no injection of shared memory from the host is needed.
@Horcrux7 from the perspective of a wasm module, if you want to import a function that has some functionality beyond what can be done with scalar values, you would need that function, the implementation, to have access to your memory to pass the data around. For example, I import a read_file function from the host, how does it write the contents of the file into my memory?
Also on the subject of the custom logic, It would be great if it was possible to call custom entrypoints during the evaluation phase. So far WASI has _start and __wasi_unstable_reactor_start, and N-API has _napi_register.
Currently this is not possible. But in C and many other languages would you pass a pointer to the memory to put the data. Only in Wasm you have to pass an int value with a position. And the memory is one large static shared block. Wrong code can write to any location in this memory.
@devsnek The purpose of this issue is to find general, cross-environment solutions to the problem you are articulating. One possibly would be a "host hook" to let, say, WASI do something when the module starts up (as your integration does). I'd like to figure out which pattern we should broadly encourage.
What do you think about adding a new custom section that stores the index of the additional start function and how to run: A. as a regular function, after the start function and calling into the event-loop. Fixes the bundler, esm-integration and WASI use case. B. same as A but re-instantates the module first. Fixes the CLI program issue https://github.com/WebAssembly/esm-integration/issues/30.
My understanding was that this would be solved by interface types providing the ability to pass nonscalar values as arguments.
that's true if you pass all your imports as arguments in the main function, but I doubt it will be very practical in all cases.
@xtuc the main function can already access the module's imports, i'm not sure what you mean.
by the time the main function runs the JavaScript imports aren't executed because the instantiation is in a single tick and Wasm will snapshot undefined values (apart from functions). Exporting an additional start function defers the main function's code basically
so the concern is about circulars which expose tdz and get snapshotted as undefined?
I believe it's any imports, apart functions that are initialized before
@xtuc if you do export let a = 1 and then (import "whatever" "a"), it should work fine. the let a = 1 will evaluate before the execution of the wasm module (https://webassembly.github.io/esm-integration/js-api/index.html#module-execution). the case where it would not work fine is if there is a circularity between those where the wasm module's evaluation happens first, exacerbated because of the value snapshots.
recreated the proposal in engine262, seems to work:

For scalar values you are right, we ran into issues when importing objects like WebAssembly.Memory/Table/Global
@xtuc so if I export a webassembly.memory from js, it gets imported in the cyclic module as that value, and then passed directly to the instantiate call via the import object. it seems like that should be sufficient?
Perhaps the confusion here is the async execution of Wasm resulting in these objects being initialized after a tick as @xtuc says. Note this is the reason that top-level await was a prerequisite for the Wasm integration with ES modules, and using those paths in v8 for the integration would line up with the esm integration intention I believe.