MFC icon indicating copy to clipboard operation
MFC copied to clipboard

Loops can be readily condensed via fypp

Open sbryngelson opened this issue 1 year ago • 2 comments

do concurrent is usually used to invoke a standard language-level parallelism, including GPU offloading. But, if the flag for it is not set, then it doesn't do much other than, perhaps, some multithreading.

It does not seem to clash with OpenACC in my experimentation so far (https://fortran-lang.discourse.group/t/how-does-openacc-collapse-interact-with-do-concurrent/6887).

With it, we can do this:

!$acc parallel loop collapse(4) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n, k = 0:m)
	rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
		(flux_n(1)%vf(j)%sf(k - 1, l, q) &
		 - flux_n(1)%vf(j)%sf(k, l, q))
end do

instead of this

!$acc parallel loop collapse(4) gang vector default(present)
do j = 1, sys_size
    do q = 0, p
        do l = 0, n
            do k = 0, m
                rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
                                        (flux_n(1)%vf(j)%sf(k - 1, l, q) &
                                         - flux_n(1)%vf(j)%sf(k, l, q))
            end do
        end do
    end do
end do

I think we can still pull out a sequential loop as needed, like this:

!$acc parallel loop collapse(3) gang vector default(present)
do concurrent (j = 1:sys_size, q = 0:p, l = 0:n)
	!$acc parallel seq
    do k = 0,m
		rhs_vf(j)%sf(k, l, q) = 1d0/dx(k)* &
			(flux_n(1)%vf(j)%sf(k - 1, l, q) &
			 - flux_n(1)%vf(j)%sf(k, l, q))
	end do
end do

While not an actual code improvement per se, it does seem quite helpful for readability. We go from 8 lines of code for a loop to 2.

sbryngelson avatar Oct 19 '24 20:10 sbryngelson

This works with NVHPC, but not CCE compilers, in the GPU case (error is something like "collapse requires perfectly nested do loops") [FYI @abbotts ].

I reproduced it on a minimal example.

sbryngelson avatar Oct 20 '24 02:10 sbryngelson

@henryleberre created this that does the trick:

#:def forall(*args)
#:for loop in args[:-1]
do ${loop}$
#:endfor
$:args[-1]
#:for _ in range(len(args)-1)
end do
#:endfor
#:enddef

program forall_example
  implicit none
  integer :: n = 2
  integer :: m = 3
  integer :: i, j
  integer , dimension(1:2,1:2) :: x

  x(1,1) = 0
  x(1,2) = n

  x(2,1) = 1
  x(2,2) = m

  #:call forall('i=x(1,1),x(1,2)', 'j=x(2,1),x(2,2)')
    print*, i, j 
  #:endcall

end program forall_example

the created code is

program forall_example
  implicit none
  integer :: n = 2
  integer :: m = 3
  integer :: i, j
  integer , dimension(1:2,1:2) :: x

  x(1,1) = 0
  x(1,2) = n

  x(2,1) = 1
  x(2,2) = m

do i=x(1,1),x(1,2)
do j=x(2,1),x(2,2)
    print*, i, j 
end do
end do

end program forall_example

sbryngelson avatar Oct 22 '24 20:10 sbryngelson