design icon indicating copy to clipboard operation
design copied to clipboard

Synchronous JS API callback for memory growth?

Open kripken opened this issue 6 years ago • 37 comments

Background: https://github.com/WebAssembly/WASI/issues/82

In brief, JavaScript needs to know when wasm memory grows, because it needs to update JS typed array views. In emscripten we have things set up in sbrk etc. to call into JS to do that when memory grows. However, in wasi the current approach is to just call the memory grow wasm instruction inside the wasm module, so JS is not updated.

JS can check if memory grew every time wasm code runs (arrive in JS, or return from wasm, etc.). Emscripten does something similar for pthreads+memory growth, but it's unpleasant, adds overhead, and is error-prone.

Another option is to add an API in wasi to update the system on memory growth.

Yet another option is to extend the JS wasm API to get a callback when memory growth occurs. That is what this issue is about.

It might look something like this:

memory.addGrowCallback(() => {
  console.log("wasm memory grew!");
  updateAllJSViewsRightNow();
});

The callback would be synchronous, that is, when wasm memory grows, all such callbacks are immediately executed, while the wasm is still on the stack etc. This is necessary because if it's an event that happens in the next JS event loop then that is too late - we may execute JS before that, and it would use the old obsolete views.

Thoughts?

kripken avatar Aug 28 '19 23:08 kripken

Here's two existing threads on the subject: https://github.com/WebAssembly/design/issues/1271 and https://github.com/WebAssembly/design/issues/1210. They seem to address an equivalent proposal and a few others.

Macil avatar Aug 28 '19 23:08 Macil

Thanks @Macil!

The first is the pthreads+growth issue, which is very related, but distinct. (In particular a synchonous event is not enough there. as you mention there, but it would be sufficient here.)

The second does indeed look like it arrives at this proposal, so it's too bad I didn't find it when I searched earlier... but the discussion there seems to have stalled, and the proposal isn't clearly focused on the necessary synchronous aspect. So I'd suggest keeping this issue open for discussion?

kripken avatar Aug 28 '19 23:08 kripken

Adding read/write methods on wasm memories (as a substitute for ArrayBuffer views) might serve this use case too, but would still mean that arraybuffer views could never be reliably used on wasm memories, which would be an unfortnate wart.

dschuff avatar Aug 29 '19 16:08 dschuff

What actually happens right now (or is supposed to happen) if a Javascript thread takes a reference to a WASM memory's ArrayBuffer and is reading/writing to it while a separate WASM thread grows the memory? I assume one of these two things happen:

  1. The Javascript completes normally. For the current event loop run, the Javascript always sees the WASM memory's buffer property equal to the same value, and the length is unchanged. Only after the current event loop run does the WASM memory's buffer property point to a new value, and the existing ArrayBuffer gets invalidated.
  2. The ArrayBuffer becomes invalidated at an arbitrary moment in the Javascript execution, and the reads/writes go to an inactive copy of the memory and the writes get lost, or an exception happens.

If it's 1, then it seems like a synchronous JS API callback for memory growth would be fine. The callback wouldn't actually happen synchronously with the grow call (which happened in another thread), but it would happen synchronously with the point in time where the existing ArrayBuffer gets invalidated in the current JS context.

Macil avatar Aug 29 '19 23:08 Macil

IIRC, what should happen is sort of like (1). A thread obtains the memory.buffer value. This has a length field that never changes. The thread will be able to write into the buffer up to that length, even if another thread grows the memory concurrently. The extracted buffer value remains valid indefinitely, but if the thread reads memory.buffer again (even during the same turn) it may observe that the length has changed and using this buffer it can write up to the new length.

lars-t-hansen avatar Sep 27 '19 13:09 lars-t-hansen

Any new developments here?

binji avatar Jan 13 '20 21:01 binji

@kripken This all seems most cleanly solved by making sure the capabilities of ArrayBuffer, SharedArrayBuffer, and WebAssembly.Memory stay in sync. To wit, I'd be happy to resurrect the realloc portion of https://github.com/domenic/proposal-arraybuffer-transfer/ (of which I am already the champion apparently) to allow resizing of ABs and SABs along similar restrictions as put forth by wasm, while keeping their JS identity. WDYT?

syg avatar Jun 15 '20 23:06 syg

@syg

Depends what you mean by "this" - do you mean the entire issue? Then IIUC .realloc() would help but not be enough. If wasm calls memory.grow internally, we need the JS buffer and views to be updated. Even if we have a .realloc() method we can use from JS, we still need for JS to be called so that we can do that (or, we'd need the buffer and views to be updated automatically).

kripken avatar Jun 16 '20 00:06 kripken

do you mean the entire issue?

Yeah, I mean the entire issue.

If wasm calls memory.grow internally, we need the JS buffer and views to be updated. Even if we have a .realloc() method we can use from JS, we still need for JS to be called so that we can do that (or, we'd need the buffer and views to be updated automatically).

That's what I'm proposing for realloc, that it be a destructive operation that keeps the identity of the existing ArrayBuffer.

syg avatar Jun 16 '20 00:06 syg

Maybe I'm misunderstanding you, but I think realloc helps us do something cleaner in JS when a growth occurs, but we do still need to get notified, so we even get the chance to do anything in JS at all?

If I'm missing something, what would invoke the JS that calls realloc?

kripken avatar Jun 16 '20 00:06 kripken

Maybe I'm misunderstanding you, but I think realloc helps us do something cleaner in JS when a growth occurs, but we do still need to get notified, so we even get the chance to do anything in JS at all?

If I'm missing something, what would invoke the JS that calls realloc?

wasm's memory.grow would be explained in terms of realloc. If ArrayBuffers can be resized in place, there is no need to do the manual work of updating the views IIUC. It's technically a backwards breaking change, in that result of growing memory would no longer result in distinct ArrayBuffers in the JS side, but I feel like that won't actually break anything.

syg avatar Jun 16 '20 01:06 syg

Thanks, now I see! Yes, if realloc is done automatically, so ArrayBuffers and views just automatically grow with the wasm memory, then that would solve the issue here.

Could this also happen with SharedArrayBuffers, so JS is updated across threads? That would solve a very big problem too (right now we have to manually poll if the memory grew in JS).

kripken avatar Jun 16 '20 13:06 kripken

How are we planning to handle growing a SAB across threads? Are we expecting the SAB and WebAssembly.memory to be atomically updated together? It might be very hard to make sure that every JS thread sees a consistent size for memory between WASM and JS. e.g. A WASM a = memory.grow(0) * 64 * KB (get memory size in bytes) followed immediately by a SAB b = buffer.byteLength could produce a > b, which might cause bizarre bugs...

Is the expectation that memory growth in WASM will suspend all threads and atomically update their buffers? If so, that could be expensive depending on how often people want to grow memory. It's possible this doesn't matter in practice though.

kmiller68 avatar Jun 16 '20 18:06 kmiller68

We determined during previous discussions that the propagation of bounds updates between Wasm threads as a result of memory.grow would need to follow a relaxed semantics. Currently, there's no guarantee in the memory model that a grow in one thread will become visible to other threads without explicit synchronization (either through an atomic operation, or an explicit invocation of memory.size, or by performing a growth themselves).

It's still guaranteed that length updates from growth will be an "atomic counter", so other threads will never see a truly inconsistent view of memory, they just won't have freshly-grown memory reliably accessible unless they synchronize with the latest growing thread.

I'd hope JavaScript could use the same semantics, so long as realloc would truly only ever allow the memory to grow, and not shrink.

EDIT: perhaps, analogous to memory.size, reading buffer.byteLength in JavaScript could be counted as an explicit synchronization (with the last growth)? This would potentially require a barrier in some implementations, though.

conrad-watt avatar Jun 16 '20 19:06 conrad-watt

From my brief chats with @binji about the matter, shared memory would have different realloc restrictions than non-shared memory, and I don't really see another way than following what wasm already does.

syg avatar Jun 16 '20 19:06 syg

To wit, it seems very, very problematic to shrink SABs in-place. Doesn't seem so bad to shrink ABs in place?

syg avatar Jun 16 '20 19:06 syg

perhaps, analogous to memory.size, reading buffer.byteLength in JavaScript could be counted as an explicit synchronization (with the last growth)? This would potentially require a barrier in some implementations, though.

It seems like that might be desirable. Even if it causes some performance headaches...

kmiller68 avatar Jun 16 '20 22:06 kmiller68

It seems like that might be desirable. Even if it causes some performance headaches...

Seems pretty reasonable to me to say that multiple threads witnessing the length be synchronizing.

Other restrictions included that for SABs to be resizable at all, the SharedArrayBuffer constructor be extended to include the notion of a maximum size, just like WebAssembly.Memory.

syg avatar Jun 16 '20 22:06 syg

Seems pretty reasonable to me to say that multiple threads witnessing the length be synchronizing.

I don't have much of an opinion on multiple threads yet but within one thread it should be probably be synchronizing. Otherwise, you have the problem where your length magically shrinks if you ask a different API because those numbers are stored at different addresses in the VM.

kmiller68 avatar Jun 16 '20 22:06 kmiller68

Within one thread your semantics are constrained by agent-order anyways (or program order as it's normally known), which is stronger than synchronizing seqcst events.

syg avatar Jun 16 '20 22:06 syg

That's only true if all the various internal slots and whatever WASM uses are the same. Assuming slots are different then they'd be effectively relaxed, no?

kmiller68 avatar Jun 16 '20 23:06 kmiller68

I think I'd have to see the particular example you're thinking of. I don't know what synchronizing accesses in a single thread means. In my mind there shouldn't be any staleness issues intra-thread for the same reason regular AB.byteLength can't ever be stale intra-thread even though it's an unordered access.

syg avatar Jun 16 '20 23:06 syg

Currently, if a SAB is backed by a Wasm memory which is grown, the SAB stays attached, but with the same original length (so the newly grown memory isn't accessible, due to the JS bounds check). So I think @kmiller68 is thinking of things through this lens, where the "length" of a SAB is a distinct field on the object which isn't necessarily kept in sync with the length of the underlying memory (which could have been concurrently grown in Wasm).

If I'm reading https://github.com/WebAssembly/design/issues/1296#issuecomment-644778006 correctly, @kripken is rooting for a change to this, so that growth of a SAB in one thread (either through a Wasm memory.grow or a JS realloc) would mean that the length of all SABs backed by the same buffer would update.

EDIT: my comment https://github.com/WebAssembly/design/issues/1296#issuecomment-644962043 was written presupposing that this change would be accomplished by making SABs act exactly like first-class Wasm memories, so bounds/length checks are no longer against a distinct thread/object-local "length" field, but are implemented the same way as Wasm.

conrad-watt avatar Jun 16 '20 23:06 conrad-watt

@conrad-watt Yeah, that's likely the most sensible way to specify it. i.e. something like, if a JS SAB or AB object is backed by wasm, then reach into however wasm memory holds the length and return that value (times WASM page size in bytes, I guess). That way, you don't have two independent places holding the length.

Alternatively, we could go the other direction too. Where WASM gets its length from JS but that's likely more more difficult as the core wasm spec has no real concept of a host hook right now, IIRC.

kmiller68 avatar Jun 17 '20 01:06 kmiller68

Ah, I see. You mean "synchronizing" in the sense of the AB/SAB length vs the backing store length going out of sync.

I agree with @conrad-watt's assumption: that we're working off of the backing store. That's the only reasonable thing to do IMO, otherwise we'd need to constantly be synchronizing per-SAB wrapper's length with the single backing store, which is pretty weird.

syg avatar Jun 17 '20 01:06 syg

Rediscovering problems that came up the first time something like this was talked about at Mozilla with asm.js: growable ArrayBuffers unfortunately isn't enough. The various TypedArray views' lengths would also need to be auto-updated. That is both really unfortunate for implementations and for language complexity. For implementations, buffers having to track all their views is bad. For the language, not all TypedArray views should have their lengths auto-updated. For instance, if a view was created with an explicit offset and length, that shouldn't be.

Hmm...

syg avatar Jun 17 '20 21:06 syg

FWIW, this idea has been discussed before in this issue as well: https://github.com/WebAssembly/design/issues/1210

With a synchronous event, any code that potentially allocates (and therefore potentially grows memory) can potentially run JS, which can then do basically anything, including re-enter Wasm again. Most code that calls malloc is not prepared for this. I think this is adding a new foot gun, and find it somewhat concerning.

With wasm-bindgen's generated JS glue, we check whether a view has been invalidated by memory growth, and recreate the view if necessary. Basically all view-using code goes through a single code path that handles this. I suspect these checks are overall less overhead than (or at least on par with) the VM maintaining a precise set of all active views that would need to be updated when memory grows.

Is this approach untenable for others?

fitzgen avatar Jun 17 '20 21:06 fitzgen

With a synchronous event, any code that potentially allocates (and therefore potentially grows memory) can potentially run JS, which can then do basically anything, including re-enter Wasm again. Most code that calls malloc is not prepared for this. I think this is adding a new foot gun, and find it somewhat concerning.

Yes, this is my exact concern for the synchronous callback.

syg avatar Jun 17 '20 21:06 syg

Ah, I had naively thought that typed arrays could also be changed to define their length in terms of the underlying buffer, but yeah, at the very least that doesn't cover the case where the length is explicitly fixed.

Looking back through the previously linked discussions, there was this suggestion here to just expose Wasm-style accesses in JS via the Wasm memory object: https://gist.github.com/kripken/949eab99b7bc34f67c12140814d2b595

Would this solve the problem?

conrad-watt avatar Jun 17 '20 22:06 conrad-watt

Would this solve the problem?

I think it would for the wasm use case but sets an undesirable precedent. It would encourage further divergence between wasm and JS on accessing buffers. I'm still rooting for a more integrated wasm-JS interface story.

syg avatar Jun 17 '20 22:06 syg