PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

Add nowait and required barriers to satisfy dependencies.

Open LonelyCat124 opened this issue 10 months ago • 2 comments

Things to be careful of: Control flow jumps, e.g. Loops with break clauses - barriers are more complex barriers inside if statements.

When adding a nowait look forward for dependency and add appropriate barrier if non-satisfiable. Some similar requirements to #2499 in terms of dependency finding. Some similar behaviours to OMPTaskwaitTrans.

LonelyCat124 avatar Apr 18 '24 11:04 LonelyCat124

N.B. #2039 and #2040 were known issues with the previous implementation, we need to be smarter with control flow for this implementation (easy option is to give up if we aren't sure about the control flow).

LonelyCat124 avatar Apr 30 '24 10:04 LonelyCat124

Summary of today discussion (and the OpenACC version tracked in #1664): To make it resistant to further IR mutations, the nowait could be a flag attribute expressing the intent to make it asynchronous. Then, the lowering should introduce the necessary waits/barrier to avoid breaking any dependencies.

For OMP this means that

omp target loop nowait
   ...
    A = 1
omp target loop nowait
   ...
   B = 1
...
A 

the Directive lowering will use the dataflow graph connection to look forward and put a wait before "A". Note that this needs to be conservatively, e.g:

  • Wait before a CodeBlock
  • Wait at the end of the subroutine
  • Wait before a control flow jump (returns, breaks, calls - we can later refine this e.g. see if the call is pure, the async only operates on local symbols, ...)

The same logic can be applied for OpenACC to add wait/barriers, but in addition we need to look backwards to previous acc directive to chose the right queue-number to satisfy dependencies. For example:

acc parallel async(1)
  ...
   A = 1
acc parallel async(2)
  ...
  B = 1
wait
A 

Here the second "acc" has to look backwards the the previous "acc" and use queue 1 if there is a dependency to that directive body or a new queue number if there isn't or it finds a wait (restart all queues).

The task depends implementation currently needs "depend" clauses as nodes (not strings generated at the end) because the parent's "OMPSingle" lowering needs to analyse them to insert the proper taskwait. Currently this can not use the foward dataflow check to add taskwaits because in this implementation we mark sections of arrays as dependencies.

sergisiso avatar Jun 20 '24 09:06 sergisiso