html
html copied to clipboard
Ordering for Worker termination and Atomics.waitAsync promises
As discovered by @iainireland in https://bugs.chromium.org/p/v8/issues/detail?id=13258
Background
Atomics.waitAsync is a stage 3 TC39 proposal which adds an async, Promise-returning version of Atomics.wait. Atomics.waitAsync, like Atomics.wait, is basically the Linux futex API, and works off of indices into a SAB. The interesting thing about Atomics.waitAsync is that multiple Workers may be waiting on the same index, so an Atomics.notify can resolve Promises from other agents.
Surprising behavior
Suppose there are two workers, w1 and w2. Each worker has an unresolved Atomics.waitAsync Promise. Atomics.notify resolves Promises in FIFO order, and will resolve w1's Promise then w2's Promise. After a worker terminates, all its unresolved Promises are removed from the various queues tracked by Atomics.waitAsync. The main thread then executes the following code:
L1: w1.terminate();
L2: Atomics.notify(sab, idx, 1); // Notify 1 waiter
There is a race here. Worker termination steps happen "in parallel", so two interleavings are possible:
w1terminates; its unresolved Promise is removed from the queue tracked byAtomics.waitAsync- L2 notifies
w2's Promise, since that's the only Promise left in the queue
OR
- L2 notifies
w1's Promise w1terminates beforew1's Promise handler gets to run, making L2 look silently swallowed
In weak memory model lingo, L1 is program-order (i.e. thread-local order) before L2, and thus L1 happens-before L2. It is surprising that even though there is a clear ordering relationship here from the perspective of the main thread, L2 may still get swallowed due to an underlying race.
There's no ordering constraints around any of this stuff in either HTML or ecma262, so this is allowed right now. Should we do something here to try to enforce an ordering in spec?
(There are probably reasonable implementation fixes. One can imagine a fix where the agent calling Worker#terminate synchronously cleans up the Atomics.waitAsync Promise queues instead of doing it async, or something.)
I'm not quite sure I understand the observable consequences of the race. L2 is supposed to notify both workers and fulfill their promises, right? In the first race configuration:
- the promise handler in w1 never runs, because the promise was never notified
- the promise handler in w2 runs, because w2's promise was still in the queue
In the second race configuration:
- the promise handler in w1 never runs, because w1 terminated before w1's promise handler got to run
- the promise handler in w2 still runs, because w2's promise was still in the queue
So as far as I can tell they are the same. I must be missing something?
Atomics.notify has a count parameter indicating the number of waiters to notify. If you only notify one waiter, then w2's Promise isn't resolved unless w1 has already finished terminating.
@iainireland is right, this is a problem with "notify one". It probably also bears to note that these were designed to implement condition variables and mutexes and so forth, and for condition variables, "notify one" and "notify all" are the common cases.
Edit: Clarified original post to pass 1 for count to Atomics.notify.
OK, I see. That does seem like a bug in the Atomics.notify spec then, and I agree it should be fixed. If there needs to be some additional work in HTML to fix it, I'd have no problem with that.