Threadsafe uv_queue_work
Here is my proposal.
mmom@mmom-workstation:~/src/libuv$ build/uv_run_benchmarks_a queue_work
ok 1 - queue_work
# 226,318 ping-pong async jobs in 5.0 seconds (45,263/s)
# 2,455,663 chained async jobs in 5.0 seconds (491,132/s)
What it changes:
-
active_reqsis incremented atomically - this is everything that is missing -
uv__work_donestill runs on the event loop - but now it processes bursts of work units
What needs discussing:
- The atomic incrementing must be implemented in a more portable way
- Race conditions:
This PR introduces a very significant change - now
active_reqscan change while an event loop routine is running. Normally, incrementing and decrementing should be fine, but verifying if there are no more active requests could be problematic. The only case where this happens is when determining when to exit the event loop. Normally, from what I understand, there should be no valid cases where a running tasks increments it after the decision to quit has been taken?
Very surprisingly, it does appear that this is sufficient (the access to the workq is already guarded by a mutex). Though defining the meaning of uv__has_active_reqs / uv_loop_active now is fraught with additional challenge.
Most of the code is always protected by mutexes. I think that the real reason that this was not done before is that it might have appeared to be an useless feature - everything done on behalf of the user code is supposed to come from the main thread.
The only usefulness of this feature is incremental scheduling of work units so that the work flow remains interruptible allowing a round-robin of different tasks - given that the number of libuv threads is limited.
Otherwise, the main process could as well schedule all of its work at once in a single uv_queue_work
As uv__has_active_reqs / uv_loop_active are not accessible by the user, the impact should be limited to libuv itself.
Normally, from what I saw, their only function is to determine when the event loop should exit.
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
@bnoordhuis If the number of requests in the loop reaches zero while a worker task is still running (which might add a new one on completion) - this means that somewhere a higher-level routine launched an async task and it is not waiting for its completion? In this case the race condition is not preventable anyways.
@mmomtchev I don't know if you saw my edit but I think we're talking about the same thing. Punting it to the user is an option (albeit a so-so solution) if it's well-documented.
Well-documented: not just "this thing can happen" but also how to avoid that thing from happening.
@bnoordhuis, I moved INLINE/UNUSED to atomic-ops.h which is private to avoid creating a new platform-dependent header file
I am still unsure what I should add to the documentation, since the number of running requests cannot decrease in a new way - it can only increase in a new way
This new way of calling uv_queue_work will be used in situations where before one had to wait for the after_work callback to reschedule the next unit - now these will be rescheduled earlier
This means there can't be a new way to block the event's loop exit, or to cause an early exit? None that I see?
Just mentioning that I have a use for this in my project and I'm blocked on it before I can fully integrate and make use of libuv. I've tested the PR in the way I plan to use it and it works the way I'd expect it to. It really allows for much greater flexibility over how and when to dispatch work, and it removes the need for a ton of plumbing between the actual work callback and the work completion callback (where it's currently only possible to call uv_queue_work).
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
@cyanreg, this PR is mostly about taking the decision - the code is trivial
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
@bnoordhuis, can this be unblocked? It seems like it will reduce the performance impact of having smaller buffers for blocking I/O.
There's a merge conflict (but that can be fixed by switching to C11 atomics) and I don't think the documentation is sufficient enough yet.
@mmomtchev Mind if I pick this up and do something similar using C11 atomics?
@bnoordhuis
I see potential for false positives where libuv thinks the event loop is done (no more work left) just as another thread posts another work item.
I don't think that would be a problem because the req count is only decreased in uv__queue_done, which always runs on the same thread as the event loop. Basically, it will only increment off the thread so false positives should be impossible. At least I can't see where how that would happen.
The scenario is this:
- Event loop thread checks
loop->active_reqs.count > 0but it's zero - Another thread posts more work and now it's no longer zero
I.e., the event loop thread is never sure zero really means zero. That can maybe be addressed by introducing a sentinel value that tells other threads they can't post new work items anymore:
// pseudo code
unsigned sentinel = -1;
if (atomic_load(&loop->active_reqs.count) == 0)
if (atomic_compare_exchange(&loop->active_reqs.count, 0, sentinel))
return 0; // done
return 1; // more work
But now other threads can't use atomic_fetch_add() anymore, they have to load + compare-and-swap in a loop, so everything becomes more expensive.
@bnoordhuis Could we get around that issue if we document that uv_queue_work() calls can only be made off the event loop thread if they're called from inside the uv_work_cb? Then we guarantee that loop->active_reqs.count > 0. Since uv_work_t inherits from uv_req_t, it can't be removed from the count until the req is complete, right? It's a strong limitation, but it should fix the issue.
Personally, I'm not sure it's reasonable to allow uv_queue_work() to be called at any time on different threads. That would turn uv_work_t into a general purpose dispatch queue, but always force the callback to be run from the event loop thread. There are other ways to do this that wouldn't require a change to libuv.
That is probably the right limitation, in my opinion. I would say the main concern is whether we are okay with adding an atomic seq-cst fence to every operation that changes the active count (e.g. every write call). That will need to be benchmarked.
The counterargument is that it introduces a somewhat arbitrary and subtle restriction, something you can't deduce without consulting the documentation first. It arguably fails the Principle of Least Surprise.
@bnoordhuis That's a reasonable argument. That gave me an idea for my PR. Will work in that tomorrow.
I'm going to close this because of the merge conflicts but the work continues spiritually in #3938.
To my understanding, #3938 was meant to help https://github.com/nodejs/node/issues/41611 go forward.
@trevnorris, I'm sorry to ask here, but do you know if your C11 atomics follow-up work allowed for making it possible to adjust highWaterMark on child_process.spawn().stdout?
I haven't tested. Will add this to the list of things to check when I create the new PR.