PSyclone
PSyclone copied to clipboard
Add nowait and required barriers to satisfy dependencies.
Things to be careful of: Control flow jumps, e.g. Loops with break clauses - barriers are more complex barriers inside if statements.
When adding a nowait look forward for dependency and add appropriate barrier if non-satisfiable. Some similar requirements to #2499 in terms of dependency finding. Some similar behaviours to OMPTaskwaitTrans.
N.B. #2039 and #2040 were known issues with the previous implementation, we need to be smarter with control flow for this implementation (easy option is to give up if we aren't sure about the control flow).
Summary of today discussion (and the OpenACC version tracked in #1664): To make it resistant to further IR mutations, the nowait could be a flag attribute expressing the intent to make it asynchronous. Then, the lowering should introduce the necessary waits/barrier to avoid breaking any dependencies.
For OMP this means that
omp target loop nowait
...
A = 1
omp target loop nowait
...
B = 1
...
A
the Directive lowering will use the dataflow graph connection to look forward and put a wait before "A". Note that this needs to be conservatively, e.g:
- Wait before a CodeBlock
- Wait at the end of the subroutine
- Wait before a control flow jump (returns, breaks, calls - we can later refine this e.g. see if the call is pure, the async only operates on local symbols, ...)
The same logic can be applied for OpenACC to add wait/barriers, but in addition we need to look backwards to previous acc directive to chose the right queue-number to satisfy dependencies. For example:
acc parallel async(1)
...
A = 1
acc parallel async(2)
...
B = 1
wait
A
Here the second "acc" has to look backwards the the previous "acc" and use queue 1 if there is a dependency to that directive body or a new queue number if there isn't or it finds a wait (restart all queues).
The task depends implementation currently needs "depend" clauses as nodes (not strings generated at the end) because the parent's "OMPSingle" lowering needs to analyse them to insert the proper taskwait. Currently this can not use the foward dataflow check to add taskwaits because in this implementation we mark sections of arrays as dependencies.