CAM icon indicating copy to clipboard operation
CAM copied to clipboard

clubb_mf.F90 fails to build with nvhpc/25.5 on Derecho

Open gdicker1 opened this issue 5 months ago • 3 comments

NOTE: It's likely that incorporating any future ccs_config tag that uses nvhpc/25.7 software stack could close this issue.

What happened?

I just want to make people aware of an incompatibility I noticed. I don't currently think work is needed by CAM SEs to address this.

There seems to be an issue with how the get_alhl function is compiled with NVHPC v25.5. I get build errors like:

NVFORTRAN-S-0450-Argument number 1 to get_alhl: kind mismatch (/path/to/SRCROOT/components/cam/src/physics/cam/clubb_mf.F90: 534)
NVFORTRAN-S-0450-Argument number 1 to get_alhl: kind mismatch (/path/to/SRCROOT/components/cam/src/physics/cam/clubb_mf.F90: 544)
NVFORTRAN-S-0450-Argument number 1 to get_alhl: kind mismatch (/path/to/SRCROOT/components/cam/src/physics/cam/clubb_mf.F90: 548)
NVFORTRAN-S-0450-Argument number 1 to get_alhl: kind mismatch (/path/to/SRCROOT/components/cam/src/physics/cam/clubb_mf.F90: 549)
  0 inform,   0 warnings,   4 severes, 0 fatal for condensation_mf
gmake: *** [/path/to/CASE_WithCAM_DIR/Tools/Makefile:978: clubb_mf.o] Error 2

This is referencing code structured like the next block (see also src code). Note that there were no complaints about the get_waft function.

module clubb_md  
  ! ... skipping lines ...
  use shr_kind_mod,  only: r8=>shr_kind_r8
  ! ... skip
  
  subroutine condensation_mf(...)
     ! ... skip

     !local variables
     integer  :: niter,i
     real(r8) :: diff,t,qstmp,qcold,es,wf

     ! ... skip

     ! This is line 532 to 534
     do i=1,niter
       wf = get_watf(t)
       t = thl/iex+get_alhl(wf)/cpair*qc   !as in (4)
       !                                 ^--- error!

       ! ...skip
     
     ! Line 554 to 586
     contains

     function get_watf(t)
       real(r8)            :: t,get_watf,tc
       real(r8), parameter :: &
                              tmax=-10._r8, &
                              tmin=-40._r8

       tc=t-h2otrip

       if (tc>tmax) then
         get_watf=1._r8
       else if (tc<tmin) then
         get_watf=0._r8
       else
         get_watf=(tc-tmin)/(tmax-tmin);
       end if

     end function get_watf


     function get_alhl(wf)
     !latent heat of the mixture based on water fraction
       use physconst,        only : latvap , latice
       real(r8) :: get_alhl,wf

       get_alhl = wf*latvap+(1._r8-wf)*(latvap+latice)

     end function get_alhl

  end subroutine condensation_mf

  ! ... skip

end module clubb_mf

What are the steps to reproduce the bug?

  1. Download a recent version of CESM (tag cesm3_0_beta06 or newer) or EarthWorks (ewm-2.5.004 or newer)
  2. Modify Derecho config_machines to use nvhpc/25.5 stack and to not use the *-debug modules with nvhpc.
    • Changes load these Derecho modules: cesmdev/1.0 ncarenv/24.12 conda/latest nco craype cmake nvhpc/25.5 cray-libsci/24.03.0 ncarcompilers/1.0.0 cray-mpich/8.1.29 netcdf-mpi/4.9.3 parallel-netcdf/1.14.0 parallelio/2.6.6 esmf/8.8.1
  3. Then a case using CAM can be created. I tested with F2000dev and FKESSLER on mpasa120 grids.
  4. The build will die after attempting clubb_mf.F90 with NVFORTRAN-S errors.

What CAM tag were you using?

cam6_4_089 plus some EarthWorks changes

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

NVHPC

Path to a case directory, if applicable

/glade/derecho/scratch/gdicker/2025Jul08_EW_CheckDerechoNVHPC2505/oneoff_nv255_mpasa120_FKESSER

Will you be addressing this bug yourself?

No

Extra info

I observed this in EarthWorks sandbox based on tag ewm-2.5.004

  • The externals used in ewm-2.5.004 are based on those in ESCOMP/CESM tag cesm3_0_beta06.
    • Uses EarthWorksOrg/CAM tag cam-ew2.5.001. This is cam6_4_089 with some EW-specific changes.
  • cime and ccs_configs were updated to tags in cesm3_0_alpha07b. And I made changes as described in step 2.
    • Merged tag ccs_config_cesm1.0.48 from ESMCI/ccs_config_cesm into tag ccs_config-ew2.5.003 from EarthWorksOrg/ccs_config_cesm
    • Merged tag cime6.1.105 from ESMCI/cime into tag cime-ew2.5.001 from EarthWorksOrg/cime

gdicker1 avatar Jul 09 '25 16:07 gdicker1

Jim Edwards figured out this error a few weeks ago and it was likely a compiler bug. The fix is to add the definition of r8 inside the get_alhl function, even if r8 is already defined at the beginning of this module:

      function get_alhl(wf)
      !latent heat of the mixture based on water fraction
        use shr_kind_mod,  only: r8=>shr_kind_r8
        use physconst,        only : latvap , latice
        real(r8) :: get_alhl,wf

sjsprecious avatar Jul 22 '25 16:07 sjsprecious

Ah gotcha. Do you know any more about plans around this? Are CESM/CAM folks waiting for a new NVHPC version?

gdicker1 avatar Jul 22 '25 17:07 gdicker1

Collaborators at NVIDIA stated that the new compiler version, v25.7, should address this issue without the workaround.

Since that release was very recent (I think today), it may take a bit before NCAR System Admins get the software stack installed on Derecho.

gdicker1 avatar Jul 23 '25 22:07 gdicker1