libuv Threadsafe uv_queue

Here is my proposal.

mmom@mmom-workstation:~/src/libuv$ build/uv_run_benchmarks_a queue_work
ok 1 - queue_work
# 226,318 ping-pong async jobs in 5.0 seconds (45,263/s)
# 2,455,663 chained async jobs in 5.0 seconds (491,132/s)

What it changes:

active_reqs is incremented atomically - this is everything that is missing
uv__work_done still runs on the event loop - but now it processes bursts of work units

What needs discussing:

The atomic incrementing must be implemented in a more portable way
Race conditions: This PR introduces a very significant change - now active_reqs can change while an event loop routine is running. Normally, incrementing and decrementing should be fine, but verifying if there are no more active requests could be problematic. The only case where this happens is when determining when to exit the event loop. Normally, from what I understand, there should be no valid cases where a running tasks increments it after the decision to quit has been taken?

Jan 18 '22 13:01 mmomtchev

Very surprisingly, it does appear that this is sufficient (the access to the workq is already guarded by a mutex). Though defining the meaning of uv__has_active_reqs / uv_loop_active now is fraught with additional challenge.

Feb 11 '22 03:02 vtjnash

Most of the code is always protected by mutexes. I think that the real reason that this was not done before is that it might have appeared to be an useless feature - everything done on behalf of the user code is supposed to come from the main thread.

The only usefulness of this feature is incremental scheduling of work units so that the work flow remains interruptible allowing a round-robin of different tasks - given that the number of libuv threads is limited. Otherwise, the main process could as well schedule all of its work at once in a single uv_queue_work

As uv__has_active_reqs / uv_loop_active are not accessible by the user, the impact should be limited to libuv itself.

Normally, from what I saw, their only function is to determine when the event loop should exit.

Feb 12 '22 18:02 mmomtchev

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Apr 16 '22 03:04 stale[bot]

@bnoordhuis If the number of requests in the loop reaches zero while a worker task is still running (which might add a new one on completion) - this means that somewhere a higher-level routine launched an async task and it is not waiting for its completion? In this case the race condition is not preventable anyways.

Apr 19 '22 10:04 mmomtchev

@mmomtchev I don't know if you saw my edit but I think we're talking about the same thing. Punting it to the user is an option (albeit a so-so solution) if it's well-documented.

Well-documented: not just "this thing can happen" but also how to avoid that thing from happening.

Apr 19 '22 11:04 bnoordhuis

@bnoordhuis, I moved INLINE/UNUSED to atomic-ops.h which is private to avoid creating a new platform-dependent header file I am still unsure what I should add to the documentation, since the number of running requests cannot decrease in a new way - it can only increase in a new way This new way of calling uv_queue_work will be used in situations where before one had to wait for the after_work callback to reschedule the next unit - now these will be rescheduled earlier This means there can't be a new way to block the event's loop exit, or to cause an early exit? None that I see?

Apr 20 '22 17:04 mmomtchev

Just mentioning that I have a use for this in my project and I'm blocked on it before I can fully integrate and make use of libuv. I've tested the PR in the way I plan to use it and it works the way I'd expect it to. It really allows for much greater flexibility over how and when to dispatch work, and it removes the need for a ton of plumbing between the actual work callback and the work completion callback (where it's currently only possible to call uv_queue_work).

May 27 '22 00:05 cyanreg

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Jul 10 '22 12:07 stale[bot]

@cyanreg, this PR is mostly about taking the decision - the code is trivial

Jul 11 '22 20:07 mmomtchev

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Sep 21 '22 02:09 stale[bot]

@bnoordhuis, can this be unblocked? It seems like it will reduce the performance impact of having smaller buffers for blocking I/O.

Dec 09 '22 14:12 realies

There's a merge conflict (but that can be fixed by switching to C11 atomics) and I don't think the documentation is sufficient enough yet.

Dec 10 '22 11:12 bnoordhuis

@mmomtchev Mind if I pick this up and do something similar using C11 atomics?

@bnoordhuis

I see potential for false positives where libuv thinks the event loop is done (no more work left) just as another thread posts another work item.

I don't think that would be a problem because the req count is only decreased in uv__queue_done, which always runs on the same thread as the event loop. Basically, it will only increment off the thread so false positives should be impossible. At least I can't see where how that would happen.

Mar 28 '23 21:03 trevnorris

The scenario is this:

Event loop thread checks loop->active_reqs.count > 0 but it's zero
Another thread posts more work and now it's no longer zero

I.e., the event loop thread is never sure zero really means zero. That can maybe be addressed by introducing a sentinel value that tells other threads they can't post new work items anymore:

// pseudo code
unsigned sentinel = -1;
if (atomic_load(&loop->active_reqs.count) == 0)
  if (atomic_compare_exchange(&loop->active_reqs.count, 0, sentinel))
    return 0; // done
return 1; // more work

But now other threads can't use atomic_fetch_add() anymore, they have to load + compare-and-swap in a loop, so everything becomes more expensive.

Mar 29 '23 11:03 bnoordhuis

@bnoordhuis Could we get around that issue if we document that uv_queue_work() calls can only be made off the event loop thread if they're called from inside the uv_work_cb? Then we guarantee that loop->active_reqs.count > 0. Since uv_work_t inherits from uv_req_t, it can't be removed from the count until the req is complete, right? It's a strong limitation, but it should fix the issue.

Personally, I'm not sure it's reasonable to allow uv_queue_work() to be called at any time on different threads. That would turn uv_work_t into a general purpose dispatch queue, but always force the callback to be run from the event loop thread. There are other ways to do this that wouldn't require a change to libuv.

Mar 29 '23 17:03 trevnorris

That is probably the right limitation, in my opinion. I would say the main concern is whether we are okay with adding an atomic seq-cst fence to every operation that changes the active count (e.g. every write call). That will need to be benchmarked.

Mar 29 '23 18:03 vtjnash

The counterargument is that it introduces a somewhat arbitrary and subtle restriction, something you can't deduce without consulting the documentation first. It arguably fails the Principle of Least Surprise.

Mar 29 '23 23:03 bnoordhuis

@bnoordhuis That's a reasonable argument. That gave me an idea for my PR. Will work in that tomorrow.

Mar 30 '23 01:03 trevnorris

I'm going to close this because of the merge conflicts but the work continues spiritually in #3938.

May 23 '23 10:05 bnoordhuis

To my understanding, #3938 was meant to help https://github.com/nodejs/node/issues/41611 go forward.

@trevnorris, I'm sorry to ask here, but do you know if your C11 atomics follow-up work allowed for making it possible to adjust highWaterMark on child_process.spawn().stdout?

Aug 05 '23 02:08 realies

I haven't tested. Will add this to the list of things to check when I create the new PR.

Aug 15 '23 22:08 trevnorris

Threadsafe uv_queue_work