Support queuing work from a worker thread (feature request)
This is a very significant evolution that can be accomplished with relatively minor modifications to the code.
uv_queue_work() should be thread-safe and support queuing work from any thread.
Queuing work from a worker thread will allow some very significant optimizations when chaining multiple independent operations that must run sequentially. Currently scheduling the next job requires synchronization with the main thread / event loop which in some cases is absolutely useless and leads to a very significant performance degradation.
Node.js readFile is a perfect example of this problem: https://github.com/nodejs/node/issues/41435
Pull request welcome, although I expect you'll find it can't be retrofitted without introducing performance regressions elsewhere.
uv_queue_work() adds the uv_work_t request to the internal "pending requests" list, which is a list of all in-flight requests, not just uv_work_t instances. You will now have to guard every access with a mutex, which is going to regress the common case.
You can sidestep that by not adding uv_work_t requests to the list but, since said list is used to determine whether the event loop is "alive" (should keep running), you'll need to devise an additional mechanism to keep track of uv_work_t live-ness - probably using atomic counters, which also adds overhead.
edit: and keep in mind you need to keep ABI and API compatibility, otherwise it can't land in the v1.x branch.
I did some testing and found that:
- the performance of libuv in the new benchmark
uv_queue_workmatches very closely (less than 10% slower) the performance ofperf bench sched pipewhich is essentially the same test by the Linux kernel developers - a process waking up another process by using a pipe - without the additional overhead of libuv - meaning that it is pointless to try to optimize this further -
eventfdis only very marginally faster than a pipe on a modern Linux kernel - I have 17µs for a pipe vs 15µs foreventfdon my computer - this is the time that elapses between initiating thewritefrom the sender and returning from theepollin the receiver - theuv_queue_workis limited to1 / xiterations per second of this time, when one also factors the time necessary for the mutex+condition variable which is quite faster - EDIT: locking an uncontested
mutexis handled by the glibc without a syscall, but the cost is only marginally lower than a syscall version (about 75ns) - the time that elapses between one thread releasing a mutex and another one waiting on that same mutex waking up to replace it on the same core is about 2µs
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
@bnoordhuis @vtjnash As you can see from #3429, this can be retrofitted with no performance loss at all - the only change required is that the tracking of the number of pending I/O requests is to be atomic
However if this change is to be implemented, than it is also to be supported - so this means that queuing from the worker thread should not come with any hidden pitfalls - which, AFAIK, is indeed the case
Also, my PR reshuffles some of the header files in a significant way, do you have any opinion on this?
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
@mmomtchev, I'm sorry to ask here, but do you know if the recent addition of C11 atomics (https://github.com/libuv/libuv/pull/3950) allowed for this, and subsequently https://github.com/nodejs/node/issues/41611 to go forward?
@realies Not really, the groundwork is there - as now there are portable atomics in libuv - but the counting of the open file handles does not use them. The feature is only half way there.
FYI I started this work in https://github.com/libuv/libuv/pull/3938, but decided to split it up into smaller PRs. Plan on coming back to this soon.