fiber
fiber copied to clipboard
A few notes on work_stealing
Hi all,
I wanted to implement a thread pool for executing fibers using the work_stealing
algorithm from the library. While doing this I ran into a couple of problems. Maybe these are intended limitations of the library but I have not anticipated them by just reading the documentation. So I thought I just post here what I noted while using Boost.Fiber.
-
You cannot use multiple instances of work_stealing scheduler concurrently in an application:
work_stealing
maintains the list and number of its instances in twostatic
class members: https://github.com/boostorg/fiber/blob/1bc393b3ba8357787d045f6a13a5f7c0de1769d2/include/boost/fiber/algo/work_stealing.hpp#L39-L40 Therefore, we cannot have two disjoint groups of threads usingwork_stealing
to distribute the fibers among themselves. -
You cannot use multiple instances of work_stealing scheduler sequentially in an application:
The static member
counter_
is initialized with0
https://github.com/boostorg/fiber/blob/1bc393b3ba8357787d045f6a13a5f7c0de1769d2/src/algo/work_stealing.cpp#L26 and incremented as soon as a thread arrives https://github.com/boostorg/fiber/blob/1bc393b3ba8357787d045f6a13a5f7c0de1769d2/src/algo/work_stealing.cpp#L36-L37 The value (saved inid_
) is then used as vector index of the current thread. However, there is no way to resetcounter_
to0
. Hence, we cannot usework_stealing
a second time. -
You need to have at least two threads to execute
work_stealing
algorithm:Otherwise, the single thread may try to steal from a random thread that is different from itself and cannot succeed.
It may not make much sense to perform the algorithm with a single thread but this way you cannot just change one parameter in order get a single worker thread performing the tasks.
The first points I have circumvented by adapting the code of work_stealing
as pooled_work_stealing
and moving the static variable into a pool_ctx
class (see lenerd/fiber_tp).
-
You need to keep the "main" fibers running until all fibers managed by the scheduler are executed: There is something written in the documentation: Keep workers running
In order to keep the worker thread running, the fiber associated with the thread stack (which is called “main” fiber) is blocked. For instance the “main” fiber might wait on a condition_variable. For a gracefully shutdown condition_variable is signalled and the “main” fiber returns. The scheduler gets destructed if all fibers of the worker thread have been terminated.
Initially I understood this paragraph (and especially the last sentence) in the way that I need to prevent the "pool" to get empty (similarly to using
io_context::work
in Boost.Asio). However, if I understood the behavior correctly, the "main" fiber associated with the thread stack needs to stay alive until all other fibers are completed.
This leads to the next point:
-
Assume we have such a pool of worker threads and we have created all fibers that we want to run in this pool. Moreover, let there be a fiber which is currently blocked, e.g., waiting for a future whose value is set by some thread outside of the pool. Before I am allowed to close the pool and let the "main" fibers terminate, I have to be sure that this fiber is also completed.
Is there a way to find out if there are still (possibly blocked) fibers waiting to get completed?
That's it so far. Thanks for the nice library.
While benchmarking this library and diving in a lot of segmentation violations, I think I just hit this static
member problem...
There is also a similar problem in the NUMA version https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/numa/algo/work_stealing.hpp#L41 and the shared-work scheduler https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/algo/shared_work.hpp#L40
Ran into this today:
You need to have at least two threads to execute work_stealing algorithm
I tried to create work_stealing with 1 thread, but it hung in this_fiber::sleep_for()
. This limitation should be documented.