parsec icon indicating copy to clipboard operation
parsec copied to clipboard

Profiling: Recursive and other Asynchronous Tasks

Open omor1 opened this issue 2 years ago • 5 comments

Description

PaRSEC recursive tasks are currently problematic to profile, especially in a distributed context. Recursive tasks are not modeled explicitly by the DSL; instead, they are set up by:

  1. Generating task creates a new taskpool with only local data
  2. Generating task adds it to the current context and connects its own completion to that of the child taskpool via parsec_recursivecall
  3. Generating task returns with PARSEC_HOOK_RETURN_ASYNC

From the runtime's perspective then, it is impossible to differentiate between a recursive task and any other asynchronous task. In terms of profiling, the time measured is the body of the generating task—not the actual computation; this affects all asynchronous tasks. To determine the actual time for compute, one needs to look at the compute time of tasks in the child taskpool.

Connecting between the child taskpool and parent taskpool is nontrivial, however. Taskpool IDs are not synced between nodes when generating recursive taskpools, so there are duplicate tpids on different nodes. For some computations, such as 3-flow HiCMA, it is possible to determine the relationship between the generating parent HiCMA_dpotrf_L_3flow::potrf_dpotrf task and the child dpotrf_L taskpool by virtue of knowing that all the dpotrf tasks are necessarily serial and generated in order, so that the order of tpids and the order of dpotrf tasks on each node is the same. However, for other computations, this order is not predictable a priori, so this becomes an intractable task.

Describe the solution you'd like

The best solution, in my opinion, is to record in the info of a generating recursive task the taskpool ID of the child taskpool—that allows one to both measure only the time of execution bodies and easily find all child tasks from the parent task. This may be annoying to implement, however, since recursive tasks appear identical to other asynchronous tasks. Suggestions of how to fix this are welcome. Integrating recursive tasks more concretely into the DSL rather than the current ad hoc implementation would be one way, but is much more invasive.

Describe alternatives you've considered

A workaround to the above might be to add a local that stores the tpid of the child taskpool—since it's a local, it will be recorded in the profiling info. This needs to be done for every recursive task though.

An alternative would be to move the triggering of the EXEC_END event to the __parsec_complete_execution, rather than after execution of the task body in __parsec_execute. This would have very little effect on the profiling of "regular" tasks, since __parsec_complete_execution is called right after __parsec_execute inside __parsec_task_progress when the task returns PARSEC_HOOK_RETURN_DONE. For asynchronous tasks, it would extend the time when a task is considered to be executing to until the completion callback is executed—since that callback should necessarily call __parsec_complete_execution to inform the runtime that the task is complete. This has positive effects—it becomes trivial to determine the overall execution wall-time for recursive and asynchronous tasks—but also negative ones, namely that it may include large chunks of time spent in the runtime or executing other tasks.

Another alternative is to keep the current EXEC_END event and add a new event that has the behavior I describe above. The upside here is that we don't lose any information we currently gather; however, this duplicates the number of events per task, which may be problematic.

Additional context

This capability would be quite useful for Daniel Mishler's visualization work, as well as for other uses of PaRSEC profiling information.

omor1 avatar Jul 08 '22 18:07 omor1