E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Cray compiler on Crusher fails to create device array when using OpenACC declare device_resident

Open philipwjones opened this issue 3 years ago • 3 comments

In a few MPAS routines, arrays were declared device_resident. At run time on Crusher, this resulted in an invalid dope vector. With CRAY_ACC_DEBUG=3, it looks like these arrays never get allocated on the device so the device pointer is 0. Workaround is to just use usual data create/delete directives instead and I will push a PR with that replacement soon.

@sarats @mattdturner, @twhite-cray and @abbotts Already reported via OLCF help but reproduced here for tracking. A simple reproducer is here - gives a different error message, but for the same reason:

program deviceResidentBug

  implicit none

  ! This is a bug reproducer for an issue with the use of the
  ! OpenACC declare device_resident directive

  integer :: i,j,nx,ny,ierr

  double precision, dimension(:), allocatable :: &
     XX, YY

  double precision, dimension(:,:), allocatable :: &
     Ar
  !$acc declare device_resident(Ar)

  call MPI_INIT(ierr)
  nx = 512
  ny = 512

  allocate (XX(nx), YY(ny))
  allocate (Ar(nx,ny))

  do i=1,nx
     XX(i) = i
  end do
  YY(:) = 0.0d0
  !$acc enter data copyin(XX,YY)

  !*** device resident version returns errors that
  !*** according to debug has no device pointer for Ar

  ! Ar was declared device_resident so shouldn't need a data directive 
  ! or present clause, tho did try the latter with no difference

  !$acc parallel loop collapse(2)
  do j=1,ny
  do i=1,nx
     Ar(i,j) = i+j
  end do
  end do

  !$acc parallel loop present(XX, YY)
  do j=1,ny
  do i=1,nx
     YY(j) = Ar(i,j)*XX(i)
  end do
  end do

  !$acc exit data copyout(YY) delete(XX)

  call MPI_FINALIZE(ierr)

end program deviceResidentBug

philipwjones avatar Jun 02 '22 20:06 philipwjones

cc @abbotts

sarats avatar Jun 14 '22 18:06 sarats

The automatic mapping of allocatables for declare target device_resident and create clauses was introduced in OpenACC 2.5, which unfortunately fell in the time frame when Cray wasn't actively supporting OpenACC. That means CCE doesn't currently support the usage here.

We're working on getting current on OpenACC in Fortran but it may make some time. I've added a note to an internal ticket that we have a user code waiting for this feature, which should help prioritize it. The issue you opened through the OLCF will help us set priorities too.

The enter data create WAR should work and be stable, but if you hit other issues please let me know. Especially with OpenACC features since 2.0.

abbotts avatar Jun 14 '22 21:06 abbotts

Thanks @abbotts Yeah, always safer to stick to the standard approaches. I have the WAR PR almost ready but stumbled into another bug, probably on our end. And now leaving on vacation, but should be able to close this out shortly after I get back.

philipwjones avatar Jun 16 '22 15:06 philipwjones