E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Cray Fortran: incorrect index return value with -h zero (known workaround, Cray ticket open)

Open xyuan opened this issue 3 years ago • 11 comments

There is CRAY Fortran compiler build issue, that is associated with the use of return in a fortran function, see below.

the example code can be found https://github.com/E3SM-Project/E3SM/blob/master/components/eam/src/physics/rrtmgp/radconstants.F90

character(len=gasnamelength), public, parameter :: gaslist(nradgas) & = (/'H2O ','O3 ', 'O2 ', 'CO2 ', 'N2O ', 'CH4 ', 'CFC11', 'CFC12'/)

integer function rad_gas_index(gasname)

! return the index in the gaslist array of the specified gasname

character(len=*),intent(in) :: gasname integer :: igas

rad_gas_index = -1 do igas = 1, nradgas if (trim(gaslist(igas)).eq.trim(gasname)) then rad_gas_index = igas return endif enddo call endrun ("rad_gas_index: can not find gas with name "//gasname) end function rad_gas_index

for any gasname as input, the returned rad_gas_index is 0, however it should be index=4 for index=rad_gas_index("CO2"). This bug affects many code in E3SM, and hard to work around all of the function code, so I strongly recommend the CRAY Fortran compiler to support this feature.

xyuan avatar Jun 05 '22 15:06 xyuan

Hi @xyuan I am curious about that if this problem also affects other gases?

@keziming

keziming avatar Jun 05 '22 19:06 keziming

Hi @xyuan I am curious about that if this problem also affects other gases?

@keziming

yeah, it affects all the gases

xyuan avatar Jun 05 '22 22:06 xyuan

cc @twhite-cray @abbotts @mattdturner

sarats avatar Jun 06 '22 17:06 sarats

The modules loaded are:

Currently Loaded Modules:

  1. craype/2.7.15 5) xalt/1.3.0 9) craype-accel-amd-gfx90a 13) subversion/1.14.0 17) cray-libsci/21.08.1.2
  2. cray-dsmml/0.2.2 6) DefApps/default 10) rocm/4.5.2 14) git/2.31.1 18) cray-hdf5-parallel/1.12.0.7
  3. PrgEnv-cray/8.3.3 7) libfabric/1.15.0.0 11) cray-mpich/8.1.16 15) cmake/3.22.2 19) cray-netcdf-hdf5parallel/4.7.4.7
  4. cce/14.0.0 8) craype-network-ofi 12) cray-python/3.9.4.2 16) zlib/1.2.11 20) cray-parallel-netcdf/1.12.1.7

xyuan avatar Jun 06 '22 18:06 xyuan

and the command is: cd /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm && python3 /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/Tools/e3sm_compile_wrap.py /opt/cray/pe/craype/2.7.15/bin/ftn -DBIT64 -DCAM -DCNL -DCO2A -DCPRCRAY -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=9 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPDLOG_COMPILED_LIB -DSPMD -DYES3DVAL=0 -D_MPDATA -D_MPI -D_PNETCDF -D_PRIM -D__HIP_ROCclr__ -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm/yakl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/. -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.7.4.7/crayclang/10.0/include -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/cray/pe/parallel-netcdf/1.12.1.7/crayclang/10.0/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/SourceMods/src.eam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/pp_none -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/bulk_aero -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/aerosol -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/mozart -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/utils -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/cam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/cpl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/control -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/utils -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/lnd/obj -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/gptl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/hipCUB/hipcub/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/rocPRIM/rocprim/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src/ekat/ekat_f90_modules -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/yaml-cpp/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/spdlog/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/p3/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/shoc/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/../../../externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/. -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/cloud_optics -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/fluxes_byband -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples/all-sky -em -J. -cpp -s default32 -eZ -O2 -h noacc -h zero -hfp0 -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/rocm-4.5.2/include -f free -N 255 -h byteswapio -em -M1077 -DUSE_CONTIGUOUS=contiguous, -c /gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se/dyn_grid.F90 -o CMakeFiles/atm.dir///eam/src/dynamics/se/dyn_grid.F90.o

xyuan avatar Jun 06 '22 18:06 xyuan

@xyuan Thank you for the information about modules and compilation flags, as well as the reproducer you provided.

I was able to replicate this issue on Crusher, and narrowed it down to the use of -h zero. Without -h zero, the correct indices are returned from the function.

This is a bug in the Cray Fortran compiler, and an internal ticket has been opened.

In my testing, it looks like the correct results are given if using Cray Fortran version 13.0.2, and the incorrect ones when using Cray Fortran version 14.0.0.

Workaround While not ideal, I was able to determine a workaround of adding a write statement prior to the return. For example, with this loop

  rad_gas_index = -1
  do igas = 1, 8
    write(*,*) 'checking igas = ', igas
    if (trim(gaslist(igas)).eq.trim(gasname)) then
      rad_gas_index = igas
      return
    endif
  enddo

the results are 0 with Cray Fortran version 14.0.0

> ftn --version
Cray Fortran : Version 14.0.0
> ftn -h zero main.F90
> ./a.out
 CH4 integer =  0
 O3 integer =  0
 CFC12 integer =  0

If I change the loop to

  rad_gas_index = -1
  do igas = 1, 8
    write(*,*) 'checking igas = ', igas
    if (trim(gaslist(igas)).eq.trim(gasname)) then
      rad_gas_index = igas
      write(*,*) ''
      return
    endif
  enddo

then I get the correct results:

> ftn -h zero main.F90
> ./a.out

 CH4 integer =  6

 O3 integer =  2

 CFC12 integer =  8

mattdturner avatar Jun 06 '22 20:06 mattdturner

Another workaround is to compile the impacted files (e.g., radconstants.F90) with -O0, or keep the current flags but add -hipa0 to the options for the impacted files.

That could really hurt performance, though, depending on what other routines are in the impacted files.

mattdturner avatar Jun 07 '22 15:06 mattdturner

@mattdturner Thanks very much, let me implement the workaround and try a case on crusher

xyuan avatar Jun 07 '22 16:06 xyuan

Workaround fixed the issue interim. Waiting on a Cray compiler fix for the root cause and then we can close this.

sarats avatar Jun 14 '22 18:06 sarats

@mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.

abbotts avatar Sep 06 '22 17:09 abbotts

Yes, please close this issue. Thanks for your help on this issue.

From: Steve Abbott @.> Date: Tuesday, September 6, 2022 at 1:41 PM To: E3SM-Project/E3SM @.> Cc: Yuan, Xingqiu @.>, Mention @.> Subject: Re: [E3SM-Project/E3SM] Cray Fortran: incorrect index return value with -h zero (known workaround, Cray ticket open) (Issue #5012)

@mattdturnerhttps://github.com/mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.

— Reply to this email directly, view it on GitHubhttps://github.com/E3SM-Project/E3SM/issues/5012#issuecomment-1238465414, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFCG2CRLLUNLA2IRSP5WRTV4562ZANCNFSM5X5FYVXA. You are receiving this because you were mentioned.Message ID: @.***>

xyuan avatar Sep 06 '22 19:09 xyuan