design icon indicating copy to clipboard operation
design copied to clipboard

Limitations of start function with exported memory

Open sbc100 opened this issue 8 years ago • 57 comments

We've been experimenting with using the start section in the C/C++/clang/lld world: https://reviews.llvm.org/D40559

I've run into an issue with exported memory and static initializes (or any code in the start function). Basically no JS function that requires access to memory can be run during start because the exported memory is not available until after the start function completes.

We can switch to importing memory but that doesn't seem like a reasonable requirement.

Am I missing something?

sbc100 avatar Nov 30 '17 01:11 sbc100

Reading the linked review, it isn't clear to me what you're trying to execute in start. Is it everything before main, but excluding main, or up to and including main? FWIW I don't think start should call main, or that was the original idea for start anyways.

Regardless, it seems like segments can do all of the C static global initializers just fine, but the non-trivial C++ global initializers can't generally be done because they might execute arbitrary user code including syscalls. That's indeed troublesome!

Do you think the problem is only embedder calls trying to use an exported memory, or do you think these calls might also try to re-enter WebAssembly? I think it's a simple enough thing to do if the fix is to export memory before start, then export everything else after. If we instead export everything before start then I'm not sure...

At the same time, start is meant to model what ES6 module's "evaluate" does. In a world where WebAssembly can participate in ES6 modules, what should "evaluate" / start do w.r.t. exports?

jfbastien avatar Nov 30 '17 06:11 jfbastien

What do you mean by "is not available" during start? Presumably the Memory that is created for import could (in principle) be accessed via a closure while start is executing. Or, is there some annoying check that means you can't read/write to the Memory until it's attached to a fully-instantiated Instance?

I agree that it makes syscall implementation awkward - I thought of this problem too while I was trying to implement some syscalls for upstream Musl. I wasn't too bothered by the Memory issue, but the problem of imports seems thornier - until the Instance is created (at the end of start) there really isn't any way for imported functions to call back into Wasm. In practice, it's OK, but it could cause problems for someone.

The ideal thing would have been that "this" refers to the partially-instantiated Instance if imports are invoked during start, so that an imported function can call exports during start via "this.exports.blah". But that opens up a whole can of safety issues, while the current approach is safe albeit a bit limiting.

For syscalls, it shouldn't matter much. By definition, they are in "kernel-mode" and thus don't need to call anything in "user-mode" (Wasm). Most of the syscalls ought to be able to their thing without touching the Module/Instance.

@jfbastien - in the LLVM review, there's no particular opinion on what runs during start. Currently LLD's default entrypoint is _start (ie run main as well). I agree that's not ideal, I'm hoping that we could change the default to something else like _start_wasm to run everything before main. It's up to whoever runs the compiler which entrypoint you set, and regarding Sam's issue here I don't think it matters whether main is run or not.

NWilson avatar Nov 30 '17 09:11 NWilson

@sbc100 Yes, that does seem problematic in the near term. ES Modules do allow cyclic imports and so, if we had ES Module integration, the JS called by wasm could be an ES module that imported the calling wasm module to get the Memory.

lukewagner avatar Nov 30 '17 15:11 lukewagner

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

In terms of what we will use the start section for, the jury is still out on that. What I've been looking at is moving the C/C++ static init function there, and leaving main to be run explicitly by the embedder. But this issue exists whatever we choose to run there.

sbc100 avatar Nov 30 '17 22:11 sbc100

From my POV the problem would be solved by giving the embedder access to the newly created memory before start is called. @jfbastien its not clear to me how you would propose doing this. Can you explain?

I think that what we end up doing should match what ES6 modules do: first they link, then evaluate. This can be circular! In our implementation we use the ES6 module machinery, we first link here, resolving exports and setting the corresponding symbol table. After all the ES6 modules are linked (there's only one for WebAssembly at the moment) there's a call to evaluate which sets up element and segment, and at the very end we call the start function. That's the same as running an ES6 module's global code just before returning to the import call (and this is done in an order that's determined by link IIUC).

Given this, what I think needs to happen is that we make export availability match what ES6 modules do for their own exports during evaluate. I'd consult with @domenic or @Constellation about this.

I think this also happens to fix what you're trying to do.

jfbastien avatar Nov 30 '17 23:11 jfbastien

But from the JS API POV, how could we allow the developer to get a handle to the new memory before start is called? I was under the impression that the instance would need to be returned in order to get the memory handle, but that the start function is guaranteed to be called before then.

As a workaround I could switch all the test cases to import their memory.. which is probably not a bad idea, but obviously others could still run into this issue.

sbc100 avatar Nov 30 '17 23:11 sbc100

I'm a bit confused. If you need two-phase initialisation where some init function has to be invoked after the instance has been fully constructed and handed out, then what keeps you from using an ordinary function for that? Or asking the other way round, how would the semantics of a start function whose execution you'd have to initiate explicitly be different from just invoking a normal function?

rossberg avatar Dec 05 '17 16:12 rossberg

@rossberg using a normal function invocation is the alternative option and its what we currently do. Its just that that means our tooling conventions (and clang/lld generated code) won't be able to take advantage of the wasm start section. I don't really feel strongly about using the wasm start section, but I just wanted to raise the issue here in case people see it as problematic that C/C++ won't be using it.

sbc100 avatar Dec 05 '17 17:12 sbc100

This came up again in the llmv bug tracker recently: https://bugs.llvm.org/show_bug.cgi?id=37198

sbc100 avatar May 01 '18 16:05 sbc100

This is coming up again with WASI: https://github.com/WebAssembly/WASI/issues/19

sbc100 avatar May 17 '19 23:05 sbc100

this isn't just a start function problem. if you declare a function in js or wasm, and wasm imports it (esm), that function has no way to get access to the memory of that wasm module.

devsnek avatar May 18 '19 00:05 devsnek

This issue relates to the case when the wasm memory is exported from the module. In the case when the wasm memory is imported into the module then is seems reasonable to assume that the embedder has access to the memory since it would have needed it in order to instantiate the module in the first place.

There is third case where the memory is neither imported nor exported but of course there is no way to share the memory in that case.

sbc100 avatar May 20 '19 09:05 sbc100

However I agree this issue should probably be renamed since it relates not just to memory but to anything that wasm module might want to export.

sbc100 avatar May 20 '19 09:05 sbc100

In retrospect we should have named the start function "init function", to indicate that its purpose is module initialisation, i.e., that the module isn't intended to be accessible before that initialisation is complete. Allowing reentrancy during initialisation would bring about the well-known issues with exposing uninitialised state, and thereby breaking encapsulation guarantees (which is an important feature of Wasm's module system).

That said, I believe that all discussed cases can be handled by having the module export an explicit init function. The start function isn't for these cases, but is that a problem?

rossberg avatar May 20 '19 10:05 rossberg

The only problem I see is that often seems to confuse people.

I don't feel strongly about it but perhaps since the name is part of the binary format, we can still rename it to something else like "init" in the text format and in the spec text?

sbc100 avatar May 20 '19 10:05 sbc100

I'd be fine with renaming in the spec. Likewise for the text format, but that might be a bit more controversial. We could allow both keywords.

rossberg avatar May 20 '19 11:05 rossberg

the start function is specified to run at normal evaluation time in the esm proposal. additionally, exporting something explicit doesn't work for esm at all because nothing is there to import and run it. I'm really confused at this point. is the esm integration intended to exist at all?

This issue relates to the case when the wasm memory is exported from the module.

even if you do that, random functions a wasm module imports can't magically access that export.

devsnek avatar May 20 '19 13:05 devsnek

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist. But they throw if invoked too early. Only once the Wasm module is instantiated underneath these stubs become usable. So even with ESM, Wasm module initialisation is not reentrant or cyclic. That's a feature. :)

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

rossberg avatar May 20 '19 15:05 rossberg

@rossberg

Why would importing a custom initialisation function and explicitly invoking it after linking not work?

are you suggesting something like this?

let id = 0;
const memories = {};
export function register_memory(mem) {
  const i = id++;
  memories[i] = mem;
  return i;
}

export function fd_write(id, fd, ptr, len) {
  const mem = memories[id];
  // ...
}

you'd have to make memory a first class value, and at that point you can just do this:

export function fd_write(mem, fd, ptr, len) {
  // ...
}

devsnek avatar May 20 '19 15:05 devsnek

The ESM integration gives the illusion of cyclic linking by wrapping JS stubs around Wasm exports that don't yet exist.

We thought about doing this, but when it came down to details, that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

littledan avatar May 20 '19 17:05 littledan

@devsnek Thinking about WASI in particular (which based on the code samples, I think you are?): I think we need to separate the "current unstable" and "future (with reftypes and multi-memory)" cases:

  • For the current unstable case, all offsets are relative to the 0th memory, so you just need the instance to export its default memory (just with a plain memory export statement). IIRC, a wasm module's live exports are filled in before the start function is executed, so if you have a mini-cycle between JS glue code imported and called by wasm that (cyclicly) imports that wasm's memory, then by the time the wasm start function runs and calls the JS glue, the wasm's exported memory's live binding has been filled in. But maybe I'm out of date; @littledan correct me if I'm wrong.
  • For the future case, I think reads/writes from/to linear memory should be expressed in terms of Web IDL Bindings (or something analogous), where ranges of bytes would be passed via some abstract "buffer"/"slice" data type (used in the interface) that get created with the view/copy binding operators (which select their memory via a memory index immediate in the binding expression). Then the receiver of the buffer/slice doesn't have to worry about finding the right memory to use.

lukewagner avatar May 20 '19 17:05 lukewagner

@lukewagner its not possible to get access to any of the memory using esm, whether or not its the 0th one. i'm using wasi as a motivating example, but the issue extends much further.

devsnek avatar May 20 '19 19:05 devsnek

@littledan:

that ended up being complex and unworkable. So these stubs do not exist in the current proposal.

Ah, okay, is it just uninitialised "live" bindings then? Either way, the effect should be almost the same.

@lukewagner:

IIRC, a wasm module's live exports are filled in before the start function is executed

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined. But I believe that's good enough for those use cases, because there's no need to access the memory yet.

rossberg avatar May 21 '19 07:05 rossberg

@devsnek It is, if the module exports it.

@rossberg

The exports of a module are filled in after the start function has completed. Before you only have the bindings, but they are yet undefined

By the time the start function executes, the instance has to be fully live/constructed, so I don't see any technical reason that the exports couldn't be written eagerly into their live bindings. I actually vaguely recall discussing this in the past which is why I thought that's what was already specified.

But I believe that's good enough for those use cases, because there's no need to access the memory yet.

It really depends what people want to do during initialization, but it's not too hard to imagine the need to call out to JS for some glue utility and for that JS glue to need the exported memory to implement its functionality.

lukewagner avatar May 21 '19 18:05 lukewagner

@lukewagner if I export a function intended to be consumed by wasm (from wasm or js), how does that function access the memory of the calling wasm?

devsnek avatar May 21 '19 19:05 devsnek

@devsnek The same module exporting the function should also export its memory and the same module importing the function should also import the memory. It's not ideal, of course, but as explained above this isn't the goal state.

lukewagner avatar May 21 '19 20:05 lukewagner

@lukewagner the file with the function doesn't know which wasm module is consuming it, it's not a thing esm provides. (it also doesn't know the name of the exported memory, but that's less relevant)

devsnek avatar May 21 '19 20:05 devsnek

@devsnek I was imagining wrapping each wasm in a 1:1 glue JS module. This glue JS module could be considered a polyfill of the future ability of a wasm module to explicitly pass a view or copy of its memory as an argument to the call (e.g., a polyfill of Web IDL Bindings).

lukewagner avatar May 21 '19 21:05 lukewagner

@lukewagner I'm talking about the esm integration. it seems like a rather large hole in the design, so I'm eager to fix it. allowing some form of first class reference to memory or slices of memory (or both) would fix both this start function problem and the larger esm problem.

devsnek avatar May 21 '19 21:05 devsnek

@devsnek I see how this relates to reference-types (specifically ref.mem) or Web IDL Bindings (specifically the view binding operator), and polyfills thereof, but not directly to ESM integration. In particular, I think it would be rather encapsulation-breaking if, when not explicitly passed by the caller (via one of these options), a callee could ask for the caller's memory.

lukewagner avatar May 21 '19 21:05 lukewagner