PSyclone
PSyclone copied to clipboard
hoist_expressions in GPU script can prevent collapsing loops (I think?)
I'm looking at a section of the newer socrates code, where there is a loop:
DO k_inner=1, n_k_term_inner
k = (k_outer-1)*nd_k_term_inner + k_inner
! Set the combination of terms for this iteration of the loop
iex_major(k_inner) = iex_major_all(i_term_reduced(k))
DO j=1, n_abs
iex_minor(j, k_inner) = iex_minor_all(j, i_term_reduced(k))
i_abs = index_abs(j)
! Set the ESFT term for this absorber
iex = i_esft_pointer(j, i_term_reduced(k))
k_esft(1:n_profile, 1:n_layer, i_abs, k_inner) = &
k_abs_layer(1:n_profile, 1:n_layer, iex, i_abs)
END DO
! The product of the ESFT weights
product_weight(k_inner) = product_weight_all(i_term_reduced(k))
END DO ! k_inner
I wanted to (hopefully) collapse the k_inner
and j
loops, so I refactored the code as follows (and removed the iex_major writes to a separate loop):
DO k_inner=1, n_k_term_inner
DO j=1, n_abs
k = (k_outer-1)*nd_k_term_inner + k_inner
...
This should result in the same behaviour, not contain any race conditions etc.
However, the default script (that should have good general behaviour) prevents this as it hoists this statement (since its loop independent of the outer loop) up to the outer loop, preventing collapse:
Do k_inner=1, n_k_term_inner
k = (k_outer-1)*nd_k_term_inner + k_inner
Do j=1, n_abs
...
I think for GPUs we should be more careful with this hoisting behaviour, and use the collapse logic as well, e.g.
- If the outer loop is currently collapsible, we should not hoist a statement if it breaks the ability to collapse those loops (i.e. it remains as a child of the outer loop). We only hoist it if it can be hoisted to outside a collapsible loop.
@sergisiso @arporter what do you think? Would this be a reasonable general rule to use instead of the current one?
Makes sense, note that the hoisting happens in the script call to normalise, so you can turn it off:
normalise_loops(
invoke.schedule,
hoist_local_arrays=True,
convert_array_notation=True,
loopify_array_intrinsics=True,
convert_range_loops=True,
hoist_expressions=False
)
or make it subroutine specific with hoist_expressions=False if invoke.name = "mysubroutine" else True
The current issue for making the hoisting aware of collapse, it that the collapse logic still lives outside src
(in examples/nemo/utils.py
). This was temporal, at some point I wanted to bring it somewhere inside source. At least as a transformation option for the omp transformations, but maybe even in a more general place (collapse if basically and extension of proving the iteration independence but for more than one contiguous loops). Then we can have an option in the hoisting transformation to enable or disable hoisting this statements.
Oh and btw the OpenMP standard does not specify that the collapsed loops have to be perfectly nested, this was usually the case but some latest versions of compilers support simple statements in the middle of the loops like the one you shown, but I unsure if we should let this happen because still some compilers will fail.
re: collapse + compilers maybe we should test with: Recent-ish gcc (9 or 10?) Recent-ish intel (2021? 2022?) Recent-ish Nvidia Recent-ish cray (if we have access somewhere?) MO current preferred compiler(s)
If those all accept the new collapse then I think we're probably ok to move towards the 5.1 standard with how we handle collapse.
I'll try to check that file in detail and check if there's any other things that should be hoisted, but if not I'll try disabling hoist for that file.