html icon indicating copy to clipboard operation
html copied to clipboard

Ordering for Worker termination and Atomics.waitAsync promises

Open syg opened this issue 3 years ago • 4 comments

As discovered by @iainireland in https://bugs.chromium.org/p/v8/issues/detail?id=13258

Background

Atomics.waitAsync is a stage 3 TC39 proposal which adds an async, Promise-returning version of Atomics.wait. Atomics.waitAsync, like Atomics.wait, is basically the Linux futex API, and works off of indices into a SAB. The interesting thing about Atomics.waitAsync is that multiple Workers may be waiting on the same index, so an Atomics.notify can resolve Promises from other agents.

Surprising behavior

Suppose there are two workers, w1 and w2. Each worker has an unresolved Atomics.waitAsync Promise. Atomics.notify resolves Promises in FIFO order, and will resolve w1's Promise then w2's Promise. After a worker terminates, all its unresolved Promises are removed from the various queues tracked by Atomics.waitAsync. The main thread then executes the following code:

L1: w1.terminate();
L2: Atomics.notify(sab, idx, 1); // Notify 1 waiter

There is a race here. Worker termination steps happen "in parallel", so two interleavings are possible:

  1. w1 terminates; its unresolved Promise is removed from the queue tracked by Atomics.waitAsync
  2. L2 notifies w2's Promise, since that's the only Promise left in the queue

OR

  1. L2 notifies w1's Promise
  2. w1 terminates before w1's Promise handler gets to run, making L2 look silently swallowed

In weak memory model lingo, L1 is program-order (i.e. thread-local order) before L2, and thus L1 happens-before L2. It is surprising that even though there is a clear ordering relationship here from the perspective of the main thread, L2 may still get swallowed due to an underlying race.

There's no ordering constraints around any of this stuff in either HTML or ecma262, so this is allowed right now. Should we do something here to try to enforce an ordering in spec?

(There are probably reasonable implementation fixes. One can imagine a fix where the agent calling Worker#terminate synchronously cleans up the Atomics.waitAsync Promise queues instead of doing it async, or something.)

syg avatar Sep 08 '22 21:09 syg

I'm not quite sure I understand the observable consequences of the race. L2 is supposed to notify both workers and fulfill their promises, right? In the first race configuration:

  • the promise handler in w1 never runs, because the promise was never notified
  • the promise handler in w2 runs, because w2's promise was still in the queue

In the second race configuration:

  • the promise handler in w1 never runs, because w1 terminated before w1's promise handler got to run
  • the promise handler in w2 still runs, because w2's promise was still in the queue

So as far as I can tell they are the same. I must be missing something?

domenic avatar Sep 09 '22 06:09 domenic

Atomics.notify has a count parameter indicating the number of waiters to notify. If you only notify one waiter, then w2's Promise isn't resolved unless w1 has already finished terminating.

iainireland avatar Sep 09 '22 16:09 iainireland

@iainireland is right, this is a problem with "notify one". It probably also bears to note that these were designed to implement condition variables and mutexes and so forth, and for condition variables, "notify one" and "notify all" are the common cases.

Edit: Clarified original post to pass 1 for count to Atomics.notify.

syg avatar Sep 09 '22 16:09 syg

OK, I see. That does seem like a bug in the Atomics.notify spec then, and I agree it should be fixed. If there needs to be some additional work in HTML to fix it, I'd have no problem with that.

domenic avatar Sep 10 '22 22:09 domenic