TooManyCooks icon indicating copy to clipboard operation
TooManyCooks copied to clipboard

Private work

Open tzcnt opened this issue 8 months ago • 0 comments

"private work" is defined as work that cannot be stolen by another thread. Once a task is marked private, even if it suspends and resumes, it should always resume on the thread that it originated from. There are multiple use cases for this:

I/O-bound systems: In a heavily loaded I/O bound system it can be more efficient to use a share-nothing model, where a task can only resume on the thread it was suspended from. This can currently be emulated using a pre-fork worker model (SO_REUSEPORT) + spawning N processes. But it would be nice if that behavior could be encapsulated in a single process - and still offer the possibility to share access to other data. Other libraries that behave like this are glommio or PhotonLibOS.

This would allow the user to create an I/O executor that contains 64 threads, have a socket accept function that runs on each thread (delegated specifically to that thread using MPSC inbox?), mark that function as private, and have it automatically spawn all child operations also on the same thread.

A tangent on ex_asio There is a separate issue that ex_asio (the current only offered I/O executor) is built on asio::io_context, which scales poorly to 64 threads internally, and would perform better as 64 independent executors, but that is a tangential issue with a specific implementation that shouldn't block the overall design of this. There are potential solutions such as redesigning `tmc::ex_asio` with multiple threads as a wrapper over multiple single-threaded io_contexts, or implementing a custom executor which conforms to the `asio::executor concept`.

The "wrapper over multiple single-threaded io_contexts" wouldn't have work-stealing by default and would instead default to private work. Similarly, a custom executor for asio could be implemented with or without work-stealing, perhaps as a template parameter.

Performance improvements: Even in a fork-join context it may be desirable to run certain tasks on the same thread; even if they are logically parallel, ensuring synchronous execution may reduce overhead. It should be easy for the user to experiment with this using a simple API which can be toggled on a root task - no other code refactors required.

Interfacing with external libraries: Some libraries such as OpenGL require all calls to come from the same thread. Other libraries such as Vulkan may offer a subset of functions that are thread-safe, and others that are not. Currently a few options (ex_braid or a separate single-threaded ex_cpu) are available to work with these libraries; however in certain scenarios it may be more convenient to simply acquire the necessary resource, provide that resource to a task, and then mark that task as private. This reduces the overhead of context switching / marshaling data between threads.

Equivalents: .Net - ConfigureAwait(true)


TMC currently supports semi-private work queues (inboxes) which are shared between all threads that are part of a core group (currently defined as those threads that share an L3 cache). The current implementation of semi-private work queues do not inherit their privacy to child tasks, nor do they even guarantee that a single task will remain in the thread group after the next time it suspends. Thus, it is currently only useful as a hint to the runtime to group certain tasks together for performance reasons, when tasks have known cross-task shared memory dependencies.

An additional level of queue would need to be implemented which is fully private, for each thread. This queue could be implemented in a more optimal way if it can only be accessed once already running on the thread (SPSC) and does not support sending work from another thread (MPSC).

The existing semi-private work queues can be accessed by setting the ThreadHint of the post() family of functions. If the MPSC implementation is chosen, this could be expanded by combining the ThreadHint (which uses only some of the low bits) with a flag that specifies the privacy level - with increasing privacy level being more constrained. e.g. (PRIVACY_FLAG_SELF | ThreadHint) would allow a thread to create a private task in another thread's queue.

For awaitables, it would be nice to be able to simply specify that the awaitable must run on the same thread via fluent function, such as go_private() / go_public() or perhaps set_privacy_level() which could support private, semi-private, or back to public.

Once the privacy level has been set, it needs to be inherited by all future tasks (and awaitables?). Trusted awaitables could read this value from a thread_local variable, but unknown awaitables would need it to be implemented using await_transform.

tzcnt avatar Apr 13 '25 21:04 tzcnt