stack-switching Suspension across JavaScript frames

I've been reading the new Bag of Stacks explainer. Is it intended that intermediate JavaScript frames will also be captured by a suspended coroutine? For example if I execute switch.call to suspend my current coroutine as $c0, and in my new coroutine pass this reference through a JS call back into Wasm (possibly multiple times), and then call switch on $c0, will the newly suspended coroutine $c1 capture these intermediate JS call frames, so that if I switch back to $c1 it's possible to return from my current execution back into one of the JS frames?

If it is intended that JS frames should be captured, has there been any discussion as to how the single-threaded case could be extended to concurrent work stealing? Supporting concurrent work stealing while preventing a JS frame from being inappropriately captured and transferred to another thread has been one of our main concerns when considering the intersection of our new threads proposal and stack switching.

Apr 30 '24 08:04 conrad-watt

It is definitely NOT intended that stack switching in any guise will be permitted to capture JavaScript frames.

Apr 30 '24 16:04 fgmccabe

In the example I sketched above, is it expected that the switch to $c0 call will trap because of the intervening JS frames being captured? If so, has there been discussion of how an implementation will check for the presence of these frames?

Apr 30 '24 16:04 conrad-watt

V8 checks for this by using a counter on the boundary between wasm and JS.

Apr 30 '24 16:04 fgmccabe

Note that the scenario above was possible in the current/old JSPI design: by passing Suspender objects to JS & back. There are also simpler so-called sandwich scenarios which are also prohibited (not involving sharing of stacks/suspenders).

Apr 30 '24 16:04 fgmccabe

Small correction on the V8 implementation: we don't use a counter anymore. We check whether we are currently running on a secondary stack, which is the case if and only if suspending is currently allowed. E.g. in the "sandwich scenario", the call stack would look like JS -> promising export -> wasm -> JS -> wasm -> suspending import. We switch from the secondary stack to the central stack in the wasm->JS call, so the suspending import will trap.

Apr 30 '24 17:04 thibaudmichaud

Thanks all for the answers! Apologies for my repeated questions, I'm trying to get up to speed on everything. What's the intuition for how the above wasm->JS checking discipline for JSPI generalises to Bag-of-Stacks? Is the general idea that promising export is roughly analogous to switch.call in that it implies switching to a secondary stack? Does the suspending import bring execution back to the primary stack?

Thinking about work stealing, even in the "well-behaved" JS -> wasm -> [switch.call] -> wasm -> [switch], it seems that the initial top-level JS frame would need to be kept alive (I think of this as being "captured") so long as the suspended coroutine produced by switch.call is kept alive --- later on someone could switch to this suspended coroutine and then return from the first Wasm frame to get back into JS (if I'm misunderstanding something about this scenario, please correct me).

I don't have a clear picture of how to deal with this scenario in order to enable concurrent work stealing where keeping alive the JS frame from another thread would be problematic. The best I can envisage is a "shared" version of switch.call which switches you to a new coroutine where shared suspension is allowed, at the cost of not capturing the prior coroutine if its context was non-shared. This seems to have ergonomic problems though since you permanently lose the ability to return to the JS top-level.

Apr 30 '24 17:04 conrad-watt

The 'may not capture JS frames' is a hard requirement from TC39. Independent of the actual form of stack switching. do Currently, in V8, we do 'switch back' for all calls to imported JS functions, not just for suspending imports.

We have been thinking some about the top-level part of this, in particular in how to handle traps. One way would to replace:

JS -> wasm -> [switch.call] ->

with

JS->Promising->wasm->[switch.call]->...

This would also address the work stealing scenario (I believe).

Apr 30 '24 22:04 fgmccabe

@conrad-watt

The best I can envisage is a "shared" version of switch.call which switches you to a new coroutine where shared suspension is allowed, at the cost of not capturing the prior coroutine if its context was non-shared. This seems to have ergonomic problems though since you permanently lose the ability to return to the JS top-level.

One way to switch from a non-shared context to a new, shared-suspendable stack would be to create the new stack with a shared-non-suspendable function that takes a non-shared stack reference parameter. When switching to the new stack, that root function would do whatever it needs to with the non-shared stack reference (i.e. stick it in a global, a scheduler queue, etc.), then return_call into a shared-suspendable function. The return_call gets the non-suspendable frame off the stack, leaving only suspendable frames.

For switching from non-shared contexts to shared stacks that are not brand new, that trick doesn't quite work out of the box. The problem is that the shared stack would need to be in a suspendable context before executing a switch and would need to be in a shared-non-suspendable context afterwards so it can receive the non-shared stack reference. One solution would be a version of switch that simultaneously introduces a shared-barrier of some sort. Another solution would be a return_switch that behaves similarly to a return_call and can return-call into a shared-non-suspendable function that again can receive the non-shared stack reference as a parameter.

May 01 '24 23:05 tlively

To avoid capturing JS frames, I think bag-o-stacks will require the counter implementation rather than the "switching allowed iff current stack is non-main" implementation. Assuming we keep the implementation strategy of implicitly switching to the main stack to execute JS imports, we will also need to implicitly switch back to the non-main active stack when those imports call Wasm exports, contrary to how I understand JSPI to work.

Here's what goes wrong when you don't implicitly switch back to the non-main active stack when calling exports:

JS
|
Wasm
|
. => . ;; explicit switch (A)
     |
     Wasm
     |
. <~ . ;; implicit switch to main stack to call JS import
|
JS (B)
|
Wasm
|
X ;; switch back to stack ref from (A) does not return into JS (B)

To fix that, we need to implicitly switch back to the active stack when calling a Wasm export. This makes the implicit switching to the main stack an unobservable implementation detail. But if you allow switching any time you are on a secondary stack, you'll still get problems:

JS
|
Wasm
|
. => . ;; explicit switch (A)
     |
     Wasm
     |
. <~ . ;; implicit switch to main stack to call JS import
|
JS (B)
|
. ~> . ;; implicit switch to active stack to call Wasm export
     |
     Wasm
     |
     X ;; switch back to stack ref from (A) does not return into JS (B)

To prevent both of these problems, we need to disallow switching on a non-main stack whenever it logically contains non-suspendable frames, whether or not the implementation actually executes those frames on the non-main stack. This requires either a stack walk (which is probably how we would spec it) or a counter of non-suspendable frames.

edit: the reason to spec this check as a stack walk (i.e. some "bubbling up" mechanism) is that the check would bubble up into the import itself, at which point it will simply continue bubbling up and do the right thing if the import is a Wasm function, or otherwise it will bubble up into the embedder. If the embedder is ok with its stack frames being suspended, it can continue bubbling the check up past the import. Otherwise the embedder can bubble up a trap.

May 02 '24 00:05 tlively

We have been thinking some about the top-level part of this, in particular in how to handle traps. One way would to replace:

JS -> wasm -> [switch.call] ->

with

JS->Promising->wasm->[switch.call]->...

This would also address the work stealing scenario (I believe).

Does this mean that switch would fail/trap unless the Wasm call stack was initially entered via a promise (to avoid capturing the top-level JS frame)? If so, does this mean that if any internal component of a Wasm module graph wants to use coroutines for its control flow, then synchronous call/return into any transitively-associated Wasm code would no longer be possible? Is there any alternative mechanism for guarding the top-level JS frame that would preserve the capability for synchronous entry into Wasm?

To prevent both of these problems, we need to disallow switching on a non-main stack whenever it logically contains non-suspendable frames, whether or not the implementation actually executes those frames on the non-main stack. This requires either a stack walk (which is probably how we would spec it) or a counter of non-suspendable frames.

Is there a reason that the counter approach was previously moved away from in V8? (ref @thibaudmichaud's comment).

EDIT for @tlively's other comment:

One way to switch from a non-shared context to a new, shared-suspendable stack would be to create the new stack with a shared-non-suspendable function that takes a non-shared stack reference parameter. When switching to the new stack, that root function would do whatever it needs to with the non-shared stack reference (i.e. stick it in a global, a scheduler queue, etc.), then return_call into a shared-suspendable function. The return_call gets the non-suspendable frame off the stack, leaving only suspendable frames.

This seems to imply the need for a check that there are no further parent non-suspendable frames right?

i.e. the following should be allowed non-suspendable top-level -> tail call to suspendable frame -> switch

but the following should be disallowed non-suspendable top-level -> non-suspendable frame -> tail call to suspendable frame -> switch

Also to check my understanding, in the context of going from top-level JS -> Wasm, would the initial step of "create the new stack with a shared-non-suspendable function" involve something like the promising strategy sketched above by @fgmccabe, or is the idea that the "tail-call at top level" maneuvre would be done in JS?

May 02 '24 03:05 conrad-watt

Is there a reason that the counter approach was previously moved away from in V8?

This simplified the code and removed the small overhead of incrementing/decrementing the counter in the wasm->JS code. Nothing that would prevent us from moving back to it if this is required for core stack-switching, I think.

May 02 '24 13:05 thibaudmichaud

One further thought to lob on top of the questions from my comment above. There's reason to believe that work stealing potentially needs more thought beyond the immediate JS->Promising->wasm->[switch.call]->... solution.

All (non-async) JS is assumed to be "non-suspendable", and all Wasm is assumed to be "suspendable", so JSPI modifications are only needed at the JS->Wasm boundary. As I mentioned above I have devex/encapsulation concerns with tying the internal Wasm use of coroutines to mandatory promises at the top-level JS boundary, but I can see on a technical level that it's workable as a conceptual extension of JSPI, because if the suspended coroutine eventually terminates, it will do so in the same thread that the promise is held in, causing it to unproblematically resolve.

In the shared case, things are more complicated. First, if a computation is suspended as a shared coroutine in one thread, and then the suspended computation is completed in another thread, should the JS promise at the boundary in the first thread resolve, and if so, how should the promise be kept alive if its only root comes from the resumed coroutine executing in another thread? Second, and more fundamentally, Wasm functions are not shared-suspendable by default - at the very least we will have the unshared->shared-suspendable (pure Wasm) boundary to worry about - not to mention shared-nonsuspendable->shared-suspendable in some scenarios. How is this boundary-cross supported in pure Wasm? Do we find some way to do it synchronously, or does it need to be purely asynchronous - e.g. a pure Wasm equivalent of the .then promise method that would be used to wire things up if this was a JS->Wasm-suspendable boundary? (edit) The former approach seems intrinsically difficult for the same reason that synchronous JS->Wasm-suspendable is difficult, the latter seems challenging compared to the JS case since pure Wasm doesn't have the JS event loop to piggy-back off of.

May 07 '24 11:05 conrad-watt

@conrad-watt "All (non-async) JS is assumed to be "non-suspendable", ..." This is probably false. Because, as you identified before, allowing wasm to suspend can lead to JS being suspended.

May 07 '24 15:05 fgmccabe

@conrad-watt "All (non-async) JS is assumed to be "non-suspendable", ..." This is probably false. Because, as you identified before, allowing wasm to suspend can lead to JS being suspended.

My understanding has been that all the possible mechanisms such as the discussed JS->Promising->wasm->[switch.call]->... boundary are intended to structurally prevent suspending Wasm from also causing JS to suspend. In particular there was an earlier comment

The 'may not capture JS frames' is a hard requirement from TC39.

which I interpreted as meaning that a suspended Wasm coroutine must not suspend any JS, so as to avoid being forced to capture a suspended JS frame.

Is there an idea of some mechanism for suspending Wasm that does cause JS to suspend, but doesn't cause the suspended JS frame to be captured? I don't have a clear image of how this would work - currently my impression is that (edit: non-trivial) suspension implies capturing.

May 07 '24 15:05 conrad-watt

I misquoted, it was actually: "and all Wasm is assumed to be "suspendable", " that is false. As you noted, we will likely have to go through JSPI to access suspending in wasm.

May 07 '24 15:05 fgmccabe

Sorry, I may have written that sentence in an unhelpful way. My point is that all Wasm functions conceptually have a static nonshared-suspendable property (with analogy to the shared-suspendable property we've been discussing in threads) - another way of saying this is that pure Wasm functions have no inherent restrictions on being suspended as non-shared coroutines - we're not introducing a new kind of "suspendable" function and preventing suspensions of existing functions. This means we don't need to worry about the boundary between such hypothetical "non-suspendable" and "suspendable" functions in pure Wasm. We only need to worry about the JS ("non-suspendable") to Wasm ("suspendable") boundary, and JSPI is a technically sufficient mechanism to protect this boundary (although I feel that its ergonomics are imperfect).

The issue when extending to shared is that existing pure Wasm functions are not inherently shared-suspendable. So now we do need to start worrying about the Wasm ("non-shared-suspendable") to Wasm ("shared-suspendable") boundary. JSPI works in the JS->Wasm non-shared case because the resolution of the JS promise can be put on the event loop after the completion of the underlying Wasm computation. As I sketch in the latter part of my comment here, the situation seems more challenging in the Wasm->Wasm shared case (edit: and to a lesser extent the JS->Wasm shared case).

May 07 '24 15:05 conrad-watt

@conrad-watt Regarding "should the JS promise at the boundary in the first thread resolve, and if so, how should the promise be kept alive if its only root comes from the resumed coroutine executing in another thread? " I don't think this is a major issue. I believe it amounts to a write in one thread and a read in another (the original thread reads the Promise result from the writer).

May 07 '24 17:05 fgmccabe

I am still trying to grok why the "non-shared-suspendable" -> "shared-suspendable" boundary is different to "non-shared"->"shared".

May 07 '24 17:05 fgmccabe

EDIT for @tlively's other comment:

One way to switch from a non-shared context to a new, shared-suspendable stack would be to create the new stack with a shared-non-suspendable function that takes a non-shared stack reference parameter. When switching to the new stack, that root function would do whatever it needs to with the non-shared stack reference (i.e. stick it in a global, a scheduler queue, etc.), then return_call into a shared-suspendable function. The return_call gets the non-suspendable frame off the stack, leaving only suspendable frames.

This seems to imply the need for a check that there are no further parent non-suspendable frames right?

i.e. the following should be allowed non-suspendable top-level -> tail call to suspendable frame -> switch

but the following should be disallowed non-suspendable top-level -> non-suspendable frame -> tail call to suspendable frame -> switch

Right. We never settled on precisely how shared-non-suspendable functions, shared-suspendable functions, and shared-barriers should interact, so I'm sure there are some designs where this solution wouldn't work. I was assuming some sort of counter to detect when there is a non-suspendable frame on the stack.

Also to check my understanding, in the context of going from top-level JS -> Wasm, would the initial step of "create the new stack with a shared-non-suspendable function" involve something like the promising strategy sketched above by @fgmccabe, or is the idea that the "tail-call at top level" maneuvre would be done in JS?

I don't believe pure-Wasm stack switching, whether multithreaded or otherwise, should require Promises on the JS->Wasm boundary, but I haven't talked to @fgmccabe about this in detail and I'm not sure we're on the same page. I was imagining that the creation of that new stack, the switching, and the tail call would all be pure Wasm. The main stack is never shared-suspendable because it always has a JS frame.

Since we're talking about promises, here's what I've been thinking for how JSPI could be respecified in terms of bag-o-stacks stack switching. JSPI-wrapped exports would implicitly allocate and switch to a fresh stack. The import wrapper, upon receiving a Promise, would switch back to the main stack, where the resumed export wrapper would chain the promise with a switch back to the non-main stack. This scheme also works for implementing JSPI in user space, which is nice.

May 07 '24 19:05 tlively

@tlively

Right. We never settled on precisely how shared-non-suspendable functions, shared-suspendable functions, and shared-barriers should interact...

The main stack is never shared-suspendable because it always has a JS frame...

I don't believe pure-Wasm stack switching, whether multithreaded or otherwise, should require Promises on the JS->Wasm boundary...

I believe the "delimited" stack switching solution (typed continuations) and the "undelimited" solution (bag-of-stacks) need very different mindsets. In a delimited setting, a synchronous call from JS->Wasm can be naturally interpreted as a handler, so a signal/stack walk propagating back up can be intercepted and turned into an error, preventing the JS frame from being captured.

In an undelimited setting where the whole stack is captured without a signal/stack walk, more care needs to be taken around synchronous boundary crosses. In particular it's not enough to declare that transitioning synchronously from JS->Wasm switches from a "non-suspendable" to a "suspendable stack". The issue comes when returning from the callee Wasm "stack". Consider, in the undelimited world, what happens when you suspend within the Wasm frame, and later resume and return from it: execution would naively jump back to the caller JS frame, which would imply a need to capture it during the initial suspension. In an undelimited setting all return operations in the coroutine must transfer control back to their original caller. This is in contrast to the delimited world where there is a "delimited" top-level frame in the continuation (identified by the stack walk) such that return from it will jump execution to the context in which the continuation was resumed.

The obvious (but brutal) solution for this concern in the undelimited setting is to structurally prevent the called Wasm from returning to the caller JS frame - mandatory JSPI accomplishes this because the JS->Wasm transition is no longer a synchronous call - now when the Wasm returns, instead of synchronously resuming execution in the caller JS frame, the returned result is fed to a promise resolution enqueued on the JS event loop.

The downside of this is that regular synchronous calls from JS -> Wasm are the "normal way" for the two to interact, so there's a tradeoff of user ergonomics (and potential misc overheads from promises/the JS event loop) in exchange for avoiding the stack walk cost associated with delimited continuations. Perhaps there's an alternative solution in the undelimited setting with different characteristics, but it would need to be evaluated carefully.

EDIT: tail calls could be a potential mechanism, but the tail call would need to be carried out at the JS->Wasm boundary, and there would need to be no further parent JS frames, which seems challenging.

Then, we get to the extended version of this boundary concern when going from nonshared->shared-suspendable in pure Wasm. We previously discussed a "delimited" solution with a special kind of handler which would catch attempts to propagate a "shared-suspend" signal through a nonshared frame. Again, we have to adopt a different mindset when dealing with undelimited continuations, where there is no signal to catch.

@fgmccabe

I am still trying to grok why the "non-shared-suspendable" -> "shared-suspendable" boundary is different to "non-shared"->"shared".

In a world without shared coroutines, direct calls from nonshared->shared are fine, because the shared part of the call stack is guaranteed to continue executing in the same thread as the nonshared part of the call stack. When the shared part returns, the nonshared part similarly continues executing in the same thread. That is, shared functions can be shared between threads, and executed in any thread, but once execution of the function starts the associated call stack is "pinned" to that thread until completion. In a world where the computation can be suspended as a shared coroutine, if the computation is resumed in a new thread, returning from shared to nonshared becomes more problematic, because without intervention the nonshared part would start executing in the new thread, violating our GC invariants. Again the obvious solution in the undelimited setting is to structurally prevent the shared part from directly returning to the nonshared caller, but in pure Wasm we don't have the JSPI mechanism so some more thought is needed.

WRT the resolving cross-thread promise...

I don't think this is a major issue. I believe it amounts to a write in one thread and a read in another (the original thread reads the Promise result from the writer).

The main issue is the GC requirement to keep the yet-unresolved promise alive cross-thread. This isn't impossible, but would commit us to a particular implementation strategy that V8 is experimenting with, but isn't universally supported (for @tlively, it's ephemerons again). I agree this isn't as fundamental as the pure Wasm boundary concern.

May 08 '24 02:05 conrad-watt

Sorry, my previous explanation wasn't correct or clear enough. Here's an attempt at a well-organized explanation of how bag-o-stacks could interact with JS without any mandatory JSPI.

The property we need to maintain is that JS cannot observe stack switching. In other words, JS frames should follow a stack discipline, i.e. they should be returned from in the opposite order they are entered, just as if there were only a single global stack (per thread). This property is violated iff it is possible to "skip" a JS frame when returning. As long as we maintain this property, it doesn't matter what stacks JS frames are logically or actually executed on.

The key insight is that the main stack and non-main stacks are fundamentally different and therefore can have different rules about when switching is allowed. The main stack always has JS frames above the latest Wasm frames. Non-main stacks always start with Wasm frames, so they may or may not have JS frames above the latest Wasm frames.

If we can maintain the invariant that JS frames follow a stack discipline when non-main stacks are active, then it will always be safe to switch off of the main stack, despite the fact that it might have arbitrarily interleaved JS and Wasm frames. Stack discipline is necessarily maintained when the main stack is active because it is a single stack, so if stack discipline is also followed whenever non-main stacks are active, then stack discipline will always be followed.

To maintain stack discipline whenever non-main stacks are active, we can disallow switching off of a non-main stack whenever it contains a JS frame. This is a reasonable restriction for non-main stacks because, unlike the main stack, it is actually possible for them to not contain JS frames. Stack discipline is maintained because the system is restricted to use a single stack as long as there are JS frames on the non-main active stack.

To summarize, the rules are:

It is always valid to switch off of the main stack
It is valid to switch off of a non-main stack only if it does not contain JS frames.

These rules also prevent traps or exceptions from making stack switching observable to JS.

This presentation was clearly specific to the JS embedding, but it extends to other embeddings as well. We could conservatively enforce these rules for all embeddings, preventing any embedder from observing stack switching. Alternatively, we could allow embedders to opt-in to be suspendable when they call back into Wasm on non-main stacks.

Extending these rules to handle shared stack references is straightforward. It is never valid to switch off of any stack, including the main stack, if it contains non-shared-suspendable frames and the switch would produce a shared stack reference.

May 08 '24 18:05 tlively

The property we need to maintain is that JS cannot observe stack switching. In other words, JS frames should follow a stack discipline, i.e. they should be returned from in the opposite order they are entered, just as if there were only a single global stack (per thread).

Can you explain in more detail why this is sufficient? In particular @fgmccabe mentioned another property, "may not capture JS frames". The key point I was trying to get across in my previous comment was that even if you consider Wasm to be running in a "non-main stack", if it's possible for that Wasm code to return to a conceptually separate parent JS main stack frame, then if that Wasm is suspended, the parent JS stack frame must be captured too unless special measures are taken to change the behaviour of that return (e.g. inserting a delimiter, or some other structural approach like JSPI). Otherwise, after resuming the Wasm code it will be impossible to correctly execute the return instruction.

When I try to synthesise my understanding with your comment above, I wonder if we have different views of what it means to capture a JS frame. Do you consider JS frames on the main stack to never be "captured" by definition, even if they are only present in order to support the future execution of a suspended coroutine, and will only be removed from the stack if the coroutine is resumed? I could potentially agree with this perspective if we were only talking about the initial top-level JS stack frames, but you also seem to be suggesting in a previous comment that calling JS exports from Wasm would cause further JS frames to be added to the main stack. This seems to me to create a much more clearcut example of "capturing" if such JS frames can themselves synchronously call suspendable Wasm, even if that Wasm is moved to a "non-main" stack.

May 09 '24 00:05 conrad-watt

The property we need to maintain is that JS cannot observe stack switching. In other words, JS frames should follow a stack discipline, i.e. they should be returned from in the opposite order they are entered, just as if there were only a single global stack (per thread).

Can you explain in more detail why this is sufficient? In particular @fgmccabe mentioned another property, "may not capture JS frames".

The real requirement from TC39 is that they don't want to worry about how to update the JS semantics or implementations to account for any new kind of non-local control flow. This is achieved if JS can never observe any control flow that was not previously possible, and that in turn is achieved if we require that JS frames follow a stack discipline, because for every program execution that uses stack switching, we could construct a separate program execution that does not use stack switching that calls and returns into JS in the exact same way.

With typed continuations, "may not capture JS frames" implies "JS frames follow a stack discipline," which we know satisfies "may not expose new non-local control," so we've been using the former as a shorthand for the latter. With bag-o-stacks, however, the intuitive meaning of "capturing" a frame is completely different, so we have to be more precise.

May 09 '24 00:05 tlively

I think I've just reached a better understanding of your proposed solution, so I'll try to explain my perspective and you can tell me if I've got something wrong.

With bag-o-stacks, however, the intuitive meaning of "capturing" a frame is completely different, so we have to be more precise.

It seems to me that the definition of capturing ends up basically the same in both the typed-continuation and bag-o-stacks case - I'm interpreting the second diagram in your comment here + your explanation above as meaning that it's considered ok (as an exception to "never capture JS frames") for at most one coroutine (counted globally) to capture JS frames, because in this case there are still no JS frames in other coroutines that would make switches observable to JS. Since all Web execution starts with a JS frame, this allowance is essentially always "used up" by the initial suspended coroutine created by the very first switch.call. I'm a little suspicious as to whether capturing JS frames in this way plays well with engines, but even if it turns out that this special case isn't ok and no coroutines are allowed to capture JS frames, the language features we're discussing don't fundamentally change - it would just change where your proposed delimiters/handlers are inserted (see below).

In the same comment, I interpret your proposal as essentially adding a delimiter/handler at the JS->Wasm boundary if the JS was itself entered from a non-initial Wasm coroutine (i.e. it is non-top-level JS). This prevents subsequent suspensions from capturing any JS frame. To avoid the cost of a stack walk (which is normally required to find handlers) the presence of the handler is encoded in a counter which is eagerly propagated through the call stack. So at an inner call where switch/switch.call is executed, the counter can be checked to determine if a stack walk would find a handler at the JS->Wasm boundary.

I agree the counter strategy is technically feasible and allows bag-o-stacks to continue avoiding a stack walk, although it's worth observing that this strategy relies on there only being one kind of handler (or more generally a very small number of special cases) in scope at once. Otherwise the information that needs to be eagerly propagated can no longer be encoded in a single counter (in the worst case, you might need to propagate more general references to the current in-scope handlers).

In particular trying to extend this strategy to cover the unshared->shared-suspendable case requires a second kind of handler that only traps on suspend-shared signals. So to continue avoiding a stack walk you'd need to at the bare minimum eagerly propagate two counters through the Wasm call stack. It might be possible to bitmask the two counters into a single scalar value at least, but the state would have to be managed at every relevant boundary.

It's also worth explicitly calling out that this strategy isn't a perfect story for expressivity/modularity, because there might be times you legitimately want to suspend your Wasm computation, but are prevented from doing so because you happened to enter it transitively through a non-main stack. In particular I may have a stand-alone module that happens to internally use coroutines for control flow, which I enter and exit as a black box. Whether or not this module can execute now depends on whether any of my transitive parent stack frames happen to contain a switch to a non-main stack.

EDIT: actually, the compositionality of this solution becomes particularly bad when generalised - consider the unshared->shared-suspendable generalisation - in pure Wasm this would mean that no shared-suspend could succeed if there are transitively any synchronous unshared->shared-suspendable calls anywhere in the call stack. At least in the immediate solution, there is the safety valve of JSPI to "reset" the counter.

Going in the other direction, I wonder what it would look like to restrict typed continuations to only having a single kind of handler (or very simplified selection of handlers) in scope at once, so that it could also take advantage of the "eager propagation" strategy. Are there any existing discussions on this?

In fact, if bag-o-stacks would commit to propagating a counter through all Wasm calls, could typed-continuations, even in its current form, avoid a stack walk by analogously eagerly propagating a pointer to the closest in-scope handler? [edit: this gets complicated because of daisy-chaining of subsequent handlers]

May 09 '24 01:05 conrad-watt

It seems to me that the definition of capturing ends up basically the same in both the typed-continuation and bag-o-stacks case... Since all Web execution starts with a JS frame, this allowance is essentially always "used up" by the initial suspended coroutine created by the very first switch.call.

Yes, exactly. That's another perfectly good way to look at it, and is closer to how we would probably spec this. That one globally counted initial stack would be the "main stack" by definition.

... I interpret your proposal as essentially adding a delimiter/handler at the JS->Wasm boundary if the JS was itself entered from a non-initial Wasm coroutine (i.e. it is non-top-level JS).

Switching off the main stack when there is a Wasm->JS->Wasm sandwich is still ok, so I think everything sounds right here except possibly for the "i.e. it is non-top-level JS", which I interpret to mean any JS that has Wasm somewhere above it on the call stack.

... To avoid the cost of a stack walk (which is normally required to find handlers) the presence of the handler is encoded in a counter which is eagerly propagated through the call stack... In particular trying to extend this strategy to cover the unshared->shared-suspendable case requires a second kind of handler that only traps on suspend-shared signals. So to continue avoiding a stack walk you'd need to at the bare minimum eagerly propagate two counters through the Wasm call stack...

Yes, exactly. I don't anticipate a need for further counters, but I'd be interested in hearing if anyone else has ideas for other extensions that would need more.

It's also worth explicitly calling out that this strategy isn't a perfect story for expressivity/modularity, because there might be times you legitimately want to suspend your Wasm computation, but are prevented from doing so because you happened to enter it transitively through a non-main stack...

In general yes, but only if a JS stack frame gets mixed up in things as well. This problem is also avoidable by doing exactly what the library would be doing with typed continuations anyway: allocating and switching to its own stacks internally so it knows there will be no interfering JS frames.

... At least in the immediate solution, there is the safety valve of JSPI to "reset" the counter.

Yes, or just switching to a new stack under local control. I guess we still need to figure out whether/how JSPI is going to interact with shared stacks at all.

Going in the other direction, I wonder what it would look like to restrict typed continuations to only having a single kind of handler (or very simplified selection of handlers) in scope at once, so that it could also take advantage of the "eager propagation" strategy. Are there any existing discussions on this?

Not that I know of.

In fact, if bag-o-stacks would commit to propagating a counter through all Wasm calls, could typed-continuations, even in its current form, avoid a stack walk by analogously eagerly propagating a pointer to the closest in-scope handler? [edit: this gets complicated because of daisy-chaining of subsequent handlers]

Yeah, we've thought of strategies where we propagate a vector of handler pointers indexed by event tag (although when @fgmccabe and I were thinking most about this it was in the context of dynamic scoping). I'm not convinced we came up with anything that correctly handled all the daisy-chaining, though.

May 09 '24 05:05 tlively

Thanks, I really appreciate your time, and I think we're now mostly on the same page about the state of the world.

This problem is also avoidable by doing exactly what the library would be doing with typed continuations anyway: allocating and switching to its own stacks internally so it knows there will be no interfering JS frames.

Yes, or just switching to a new stack under local control.

I've not yet seen a part of the bag-o-stacks proposal that supports allocating a new stack internally without either capturing the current stack (and thus hitting the trap that we're trying to avoid by creating new "internal" stacks), or piggy-backing off the JS event loop by completing asynchonously. To be unambiguous, the most challenging case is a Wasm library/module that wants to present a synchronous interface while doing internal control flow using coroutines/continuations - so having an asynchronous JS wrapper that sticks things behind JSPI wouldn't be sufficient. If bag-o-stacks intends to support this, would you be able to sketch the setup?

May 09 '24 06:05 conrad-watt

Ah yes, you’re absolutely right. Just switching to a new stack doesn’t work if the caller has its own interleaving JS frame and is on a non-main stack.

If you consider typed continuations to be a “stack of stack segments” design, it can maintain the JS stack discipline through multiple chained stack segments. Since bag-o-stacks doesn’t have that overarching stack organization, it has to be more conservative about how it interacts with JS to ensure the stack discipline.

May 09 '24 16:05 tlively

In fact, even with typed continuations (wasmfx), you will still need to prohibit interleaving JS frames with wasm frames in an active computation. In particular, when you suspend to a handler, you must verify that there are no JS frames en route to the handler. That implies a frame-by-frame inspection; you might be able to accelerate that with a stack counter: you are then aggregrating counters across multiple stacks/continuations.

May 09 '24 18:05 fgmccabe

Right, there can't be JS frames in the suffix of frames that get suspended, but importantly there can be arbitrarily many JS frames on any of the stack segments above the target handler. If you were to implement wasmfx in terms of bag-o-stacks, you would be limited to having JS frames only on the root stack segment, so wasmfx is strictly more expressive on this front.

May 09 '24 19:05 tlively

I don't think that wasmfx and bos are any different here.

May 09 '24 19:05 fgmccabe

stack-switching stack-switching copied to clipboard

Suspension across JavaScript frames

stack-switching
stack-switching copied to clipboard