TileDB For Comment Only: Lums/sc 25954/random scheduler

The random scheduler is intended to demonstrate the robustness of the task graph I/O - compute infrastructure, in preparation for full state management in the task graph.

This is probably the simplest possible finite thread pool scheduler, namely one that simply picks tasks at random and executes them. The nodes in the tasks are still stateful in that they have a "program counter" indicating the next step to be taken in their I/O - compute loop. However, the scheduler does not need to be aware of the node state. It simply picks a task and executes it (by invoking the resume method).

The tasks are kept in a thread-safe RandomizedQueue, which is similar to the BoundedBufferQueue except the items in the queue are kept in an underlying vector, which is shuffled on every pop() from the queue.

The random scheduler maintains all tasks in the single randomized queue. Resuming a task from any state is safe. In particular, if resume is invoked for a node that is waiting, it will simply continue to wait.

Preliminary performance results (from the sieve benchmark) show that the random scheduler is as effective as the full Duff's device scheduler (which puts tasks into running, waiting, or runnable queues) -- as long as the number of threads per core does not become too large. Otherwise, the overhead of locking the single job queue dominates execution time. There are likely a number of optimizations that can be made to reduce these overheads.

TYPE: FEATURE DESC: A scheduler that simply picks tasks at random to run.

Mar 19 '23 07:03 lums658

This pull request has been linked to Shortcut Story #25954: Implement random scheduler.

Mar 19 '23 07:03 shortcut-integration[bot]

On Apr 10, 2023, at 7:47 AM, Robert Bindar @.***> wrote:

@robertbindar approved this pull request.

This was a nice read, thanks a lot. It's not exactly clear to me how the random scheduler is supposed to prove the robustness of the system, I'd love if you could explain this a bit in more details. How is graph supposed to work overall if its nodes are executed in random order? Do nodes yield execution if they have nothing on the input ports and no matter in which order you execute the tasks wrapping the nodes, the nodes will eventually end up running according to how data flows within the graph?

That is exactly right. And the point is to show that the performance of doing it that way is as good as doing all the things of putting tasks to sleep and putting them in waiting queues, waking them up with notification signals etc. With random a node will execute if it has input and won’t execute if it doesn’t.

By robustness I mean that the nodes and edges should behave properly and flow data through the graph even if they are selected to run at random rather than in topological sorted order, for example. I expect that after a graph (that is initially topologically sorted) is executed a few times the order of execution of nodes will become scrambled anyway.

There is a bit of contention and other overhead (due to too much shuffling) at the moment on the random task queue but I expect we can reduce those without too much effort (use double-ended queue, be smarter about when to shuffle, &c.). Right now we re-shuffle on every pop.

Cheers, Andrew Lumsdaine

Apr 10 '23 17:04 lums658

Closing for now. We have tracking stories that link to this PR for when we resume that work.

Jul 02 '24 08:07 KiterLuc

TileDB TileDB copied to clipboard

For Comment Only: Lums/sc 25954/random scheduler

TileDB
TileDB copied to clipboard