binaryen icon indicating copy to clipboard operation
binaryen copied to clipboard

Add non-legacy exceptions support (try_table) to the asyncify pass

Open zb3 opened this issue 4 months ago • 11 comments

Hopefully this PR finally adds try_table exceptions support to the asyncify pass :)

Unlike https://github.com/WebAssembly/binaryen/pull/5475 it doesn't add support for legacy exceptions, but there's no restriction on unwinding from catch blocks since in the new proposal they're ordinary blocks.

I made this thinking it'd be a simple patch but.. well it wasn't.. while I was able to finish it, it didn't really speed up qemu as much as I believed, so I most likely won't be able to polish it further..

There are 3 parts of this proposed PR:

Flatten pass

As mentioned in https://github.com/WebAssembly/binaryen/pull/6814#issuecomment-2276205776 the flatten pass doesn't support try_table either, since the guarantee was that all block return types would be removed. Since that is impossible to achieve with try_table, this PR introduces a new opt-in relaxed flat ir mode which permits blocks with return values / breaks with values where they're necessary. For it to be useful for asyncify it also needs to save return values to locals (so we can "if" it out..)

Basic support for exceptions with tags

The next step is to add support for this relaxed flat IR to asyncify - we handle the new "local set with a block" expression, where we need to ensure that we can also reach the catch block without actually throwing anything - this is achieved by adding an unconditional local.get instruction to be used when rewinding (the value will be discarded anyway).

Supporting catch blocks with exnref

As mentioned in https://github.com/WebAssembly/binaryen/issues/3739, reference types can't be stored in memory, so they need to be stored in tables. However, the restriction from https://github.com/doedrop/binaryen/commit/449dd409d856311ecb9b68763de379e253c43d45 that we could only support one pause at a time was not acceptable for qemu which uses fibers extensively. Therefore this PR introduces a hacky solution - we store refs in tables, but store their indices in memory. Additionally we use a dummy ref table as a "bitmap" so we can reuse table indices. (normally I'd do this in a separate memory and not via dummy table with null/nonnull references, but of course safari doesn't support multiple memories, so..)

Unfortunately this doesn't solve https://github.com/WebAssembly/binaryen/issues/3739 because it only works with exnref.. at first I thought that "any"ref really meant "any" reference, but then I realized there are disjoint type hierarchies. So to solve that issue we'd need a separate type for each such hierarchy.. in this PR there's only a table for exnrefs.

zb3 avatar Aug 20 '25 01:08 zb3

Interesting work here! We have been considering some changes to Flatten, including relaxing it, so this may help inform that.

Btw, do you still need Asyncify, given JSPI is in the process of shipping?

kripken avatar Aug 20 '25 20:08 kripken

@kripken, this is for QEMU coroutines which use fibers which in turn need Asyncify.. I'm not sure how JSPI would help here (hmm, could some module reentry hacks help?), I guess we'd need full stack switching support..

zb3 avatar Aug 20 '25 21:08 zb3

To use JSPI you would need to call out to JS, then back in, but JS can then pause/resume you just like Asyncify. Is this QEMU port for an environment without JS perhaps?

kripken avatar Aug 20 '25 21:08 kripken

I'm experimenting with QEMU running in the browser, trying to optimize this https://github.com/ktock/qemu-wasm (there's a room for improvement in the JIT generation, but that's beyond my capabilities for now)

I'm not sure about the implications of using JS for coroutines, but does it mean JSPI gives us support for muliple stacks for a given module instance? If so, I could also look into that (rewriting fiber to use JSPI).

zb3 avatar Aug 20 '25 21:08 zb3

Yes, you can have multiple stacks using JSPI. This is a nice overview:

https://v8.dev/blog/jspi

See also the Emscripten docs which talk about using JSPI as an Asyncify alternative,

https://emscripten.org/docs/porting/asyncify.html

An easy way to see JSPI code in action is to compile a small suspending program with emcc with Asyncify vs JSPI.

kripken avatar Aug 21 '25 15:08 kripken

That's some good news :)

In the page you wrote:

If that handler calls into compiled code, then it can be confusing, since it starts to look like coroutines or multithreading, with multiple executions interleaved. It is not safe to start an async operation while another is already running. The first must complete before the second begins.

but I assume that was about C-level safety, right? QEMU uses fibers for coroutines hence I need to preserve that functionality.

Could you please give me some tips for implementing fibers using JSPI? I'm asking because for each second you'd spend answering I'd need to spend hours figuring it out.. I'm planning on looking into that in the near future.

zb3 avatar Aug 21 '25 16:08 zb3

but I assume that was about C-level safety, right?

Yes, I think that's right.

But looping in @brendandahl who would know best the exact funtionality of JSPI, also for the Fibers question. (Emscripten has a Fibers API with Asyncify, but I believe it doesn't run with JSPI atm, and I'm not sure if that is just because it wasn't updated, or there is something more fundamental.)

kripken avatar Aug 21 '25 17:08 kripken

I don't think there's anything preventing fibers from being implemented using JSPI. IIRC, someone is doing this already in a different language already. I don't think fibers will be that efficient using JSPI since each re-entry into wasm is going to allocate another stack. I believe there was some work to minimize the cost of this though in V8.

brendandahl avatar Aug 22 '25 22:08 brendandahl

Unfortunately it appears I've just hit one major limitation with JSPI coroutines - it's not possible to continue a coroutine in a different thread (worker), unless this can somehow be worked around.. If these reference tables are per module instance it also means my PR here has this limitation too when reference types are present on the stack, albeit QEMU worked since for setjmp/longjmp these weren't used.

Stack-switching proposal won't have that limitation, right?

zb3 avatar Aug 24 '25 22:08 zb3

I would be quite surprised if wasm stack switching plans to allow suspending in one Web Worker and resuming in another. But @brendandahl @tlively can correct me if I am wrong.

If wasm had some form of lightweight thread, as has been discussed, that all runs in the same process, I can imagine it would be possible there - in theory.

kripken avatar Aug 26 '25 15:08 kripken

I managed to slightly tweak qemu so as to avoid coroutines being resumed across threads, and I'm happy to report that the JSPI version is indeed faster than the asyncify one :)

I've submitted a separate PR to the emscripten repository: https://github.com/emscripten-core/emscripten/pull/25111

However, this PR is still relevant if we want to support coroutines that can be resumed across threads or.. we want the code to work in Safari :)

zb3 avatar Aug 29 '25 23:08 zb3