E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

For maint-2.1, need similar work-around already on main to avoid HDF error

Open ndkeen opened this issue 8 months ago • 2 comments

As in PR https://github.com/E3SM-Project/E3SM/pull/7059 looks like we need to add the same workaround:

<env name="LD_LIBRARY_PATH">$ENV{CRAY_LD_LIBRARY_PATH}:$ENV{LD_LIBRARY_PATH}</env>

With maint-2.1, and SMS_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.pm-cpu_intel.allactive-wcprod:

  0: Warning! ***HDF5 library version mismatched error***
...

 0: Bye...
  0: forrtl: error (76): Abort trap signal
  0: Image              PC                Routine            Line        Source             
  0: libpthread-2.31.s  0000153767CA7910  Unknown               Unknown  Unknown
  0: libc-2.31.so       0000153763B1DCDB  gsignal               Unknown  Unknown
  0: libc-2.31.so       0000153763B1F395  abort                 Unknown  Unknown
  0: libhdf5_parallel_  000015377178EBEB  Unknown               Unknown  Unknown
  0: libnetcdf_paralle  0000153770DD4F7A  NC4_open              Unknown  Unknown
  0: libnetcdf_paralle  0000153770D558D0  NC_open               Unknown  Unknown
  0: libnetcdf_paralle  0000153770D556A4  nc_open               Unknown  Unknown
  0: libnetcdff_parall  0000153770AA0A92  nf_open_              Unknown  Unknown
  0: libnetcdff_parall  0000153770AE6819  netcdf_mp_nf90_op     Unknown  Unknown
  0: e3sm.exe           0000000001016B0B  chem_surfvals_mp_         309  chem_surfvals.F90
  0: e3sm.exe           0000000001017333  chem_surfvals_mp_         206  chem_surfvals.F90
  0: e3sm.exe           00000000014034F5  inital_mp_cam_ini          58  inital.F90
  0: e3sm.exe           000000000052CF44  cam_comp_mp_cam_i         159  cam_comp.F90
  0: e3sm.exe           00000000005242D4  atm_comp_mct_mp_a         318  atm_comp_mct.F90
  0: e3sm.exe           0000000000448578  component_mod_mp_         257  component_mod.F90
  0: e3sm.exe           0000000000436451  cime_comp_mod_mp_        1440  cime_comp_mod.F90
  0: e3sm.exe           00000000004454F2  MAIN__                    122  cime_driver.F90
  0: e3sm.exe           0000000000422C1D  Unknown               Unknown  Unknown
  0: libc-2.31.so       0000153763B081FD  __libc_start_main     Unknown  Unknown
  0: e3sm.exe           0000000000422B4A  Unknown               Unknown  Unknown

ndkeen avatar May 09 '25 14:05 ndkeen

I bumped into the same issue using pm-gpu when testing F2010-EAMxx-MAM4xx compset. FATAL ERROR: NetCDF: HDF error (file = /global/cfs/cdirs/e3sm/inputdata/atm/scream/mam4xx/linoz/ne30pg2/linoz1850-2015_2010JPL_CMIP6_10deg_58km_ne30pg2_c20240724.nc) The env setup works. We might also need to integrate this workaround into the master branch for pm-gpu.

meng630 avatar May 09 '25 16:05 meng630

Thanks for reporting @meng630 But as this issue is specific to maint-2.1 and pm-cpu, I made a new issue regarding pm-gpu (and master?) here https://github.com/E3SM-Project/E3SM/issues/7341

ndkeen avatar May 09 '25 16:05 ndkeen

Fixed in https://github.com/E3SM-Project/E3SM/pull/7627 https://github.com/E3SM-Project/E3SM/pull/7671

ndkeen avatar Sep 12 '25 19:09 ndkeen