Loops can be readily condensed via fypp
do concurrent is usually used to invoke a standard language-level parallelism, including GPU offloading. But, if the flag for it is not set, then it doesn't do much other than, perhaps, some multithreading.
It does not seem to clash with OpenACC in my experimentation so far (https://fortran-lang.discourse.group/t/how-does-openacc-collapse-interact-with-do-concurrent/6887).
With it, we can do this:
!$acc parallel loop collapse(4) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n, k = 0:m)
rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
(flux_n(1)%vf(j)%sf(k - 1, l, q) &
- flux_n(1)%vf(j)%sf(k, l, q))
end do
instead of this
!$acc parallel loop collapse(4) gang vector default(present)
do j = 1, sys_size
do q = 0, p
do l = 0, n
do k = 0, m
rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
(flux_n(1)%vf(j)%sf(k - 1, l, q) &
- flux_n(1)%vf(j)%sf(k, l, q))
end do
end do
end do
end do
I think we can still pull out a sequential loop as needed, like this:
!$acc parallel loop collapse(3) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n)
!$acc parallel seq
do k = 0,m
rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
(flux_n(1)%vf(j)%sf(k - 1, l, q) &
- flux_n(1)%vf(j)%sf(k, l, q))
end do
end do
While not an actual code improvement per se, it does seem quite helpful for readability. We go from 8 lines of code for a loop to 2.
This works with NVHPC, but not CCE compilers, in the GPU case (error is something like "collapse requires perfectly nested do loops") [FYI @abbotts ].
I reproduced it on a minimal example.
@henryleberre created this that does the trick:
#:def forall(*args)
#:for loop in args[:-1]
do ${loop}$
#:endfor
$:args[-1]
#:for _ in range(len(args)-1)
end do
#:endfor
#:enddef
program forall_example
implicit none
integer :: n = 2
integer :: m = 3
integer :: i, j
integer , dimension(1:2,1:2) :: x
x(1,1) = 0
x(1,2) = n
x(2,1) = 1
x(2,2) = m
#:call forall('i=x(1,1),x(1,2)', 'j=x(2,1),x(2,2)')
print*, i, j
#:endcall
end program forall_example
the created code is
program forall_example
implicit none
integer :: n = 2
integer :: m = 3
integer :: i, j
integer , dimension(1:2,1:2) :: x
x(1,1) = 0
x(1,2) = n
x(2,1) = 1
x(2,2) = m
do i=x(1,1),x(1,2)
do j=x(2,1),x(2,2)
print*, i, j
end do
end do
end program forall_example