Bulk task operations
While going through open issues (#4227 and this comment specfically: https://github.com/STEllAR-GROUP/hpx/issues/4227#issuecomment-555616764) I was reminded that the Moodycamel queue supports batched operations. I think we should definitely start using this. We should be able to emulate batched operations on the Boost lockfree queues as well and might improve performance there as well. The popping of tasks is also protected by a lock so any improvements we make here might have significant effects on the minimum task times we can have.
I was thinking about this recently, but have not worked on it. In the shared-priority scheduler, I reduced the size of chunks of tasks that are stolen and did away with some of the checks/settings for options like min_tasks_to_steal and others with a similar name that I have forgotten the details of. There are definitely opportunities to make use of the bulk operations and producer oriented API of the moodycamel queue and they should be looked at. As GSoC develops, I am hoping that we will also have more containers to experiment with, and this combined with an increased interest in the lockfree operations will give me an incentive to look into this more.
You can use the work-stealing deque in Eigen that Dmitry Vyukov wrote and which supports steal-half:
https://gitlab.com/libeigen/eigen/-/blob/cf7adf3a5d24355547548126599aeeb4b9ce5099/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h#L115-146
It is used in Tensorflow as far as I'm aware.
@mratsim interesting... thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.