libuv Support queuing work from a worker thread (feature request)

This is a very significant evolution that can be accomplished with relatively minor modifications to the code.

uv_queue_work() should be thread-safe and support queuing work from any thread.

Queuing work from a worker thread will allow some very significant optimizations when chaining multiple independent operations that must run sequentially. Currently scheduling the next job requires synchronization with the main thread / event loop which in some cases is absolutely useless and leads to a very significant performance degradation.

Node.js readFile is a perfect example of this problem: https://github.com/nodejs/node/issues/41435

Jan 13 '22 16:01 mmomtchev

Pull request welcome, although I expect you'll find it can't be retrofitted without introducing performance regressions elsewhere.

uv_queue_work() adds the uv_work_t request to the internal "pending requests" list, which is a list of all in-flight requests, not just uv_work_t instances. You will now have to guard every access with a mutex, which is going to regress the common case.

You can sidestep that by not adding uv_work_t requests to the list but, since said list is used to determine whether the event loop is "alive" (should keep running), you'll need to devise an additional mechanism to keep track of uv_work_t live-ness - probably using atomic counters, which also adds overhead.

edit: and keep in mind you need to keep ABI and API compatibility, otherwise it can't land in the v1.x branch.

Jan 15 '22 06:01 bnoordhuis

I did some testing and found that:

the performance of libuv in the new benchmark uv_queue_work matches very closely (less than 10% slower) the performance of perf bench sched pipe which is essentially the same test by the Linux kernel developers - a process waking up another process by using a pipe - without the additional overhead of libuv - meaning that it is pointless to try to optimize this further
eventfd is only very marginally faster than a pipe on a modern Linux kernel - I have 17µs for a pipe vs 15µs for eventfd on my computer - this is the time that elapses between initiating the write from the sender and returning from the epoll in the receiver - the uv_queue_work is limited to 1 / x iterations per second of this time, when one also factors the time necessary for the mutex+condition variable which is quite faster
EDIT: locking an uncontested mutex is handled by the glibc without a syscall, but the cost is only marginally lower than a syscall version (about 75ns)
the time that elapses between one thread releasing a mutex and another one waiting on that same mutex waking up to replace it on the same core is about 2µs

Jan 17 '22 16:01 mmomtchev

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Apr 16 '22 04:04 stale[bot]

@bnoordhuis @vtjnash As you can see from #3429, this can be retrofitted with no performance loss at all - the only change required is that the tracking of the number of pending I/O requests is to be atomic

However if this change is to be implemented, than it is also to be supported - so this means that queuing from the worker thread should not come with any hidden pitfalls - which, AFAIK, is indeed the case

Also, my PR reshuffles some of the header files in a significant way, do you have any opinion on this?

Apr 19 '22 09:04 mmomtchev

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Jun 12 '22 17:06 stale[bot]

@mmomtchev, I'm sorry to ask here, but do you know if the recent addition of C11 atomics (https://github.com/libuv/libuv/pull/3950) allowed for this, and subsequently https://github.com/nodejs/node/issues/41611 to go forward?

Aug 05 '23 02:08 realies

@realies Not really, the groundwork is there - as now there are portable atomics in libuv - but the counting of the open file handles does not use them. The feature is only half way there.

Aug 05 '23 10:08 mmomtchev

FYI I started this work in https://github.com/libuv/libuv/pull/3938, but decided to split it up into smaller PRs. Plan on coming back to this soon.

Aug 15 '23 22:08 trevnorris