Driver dies with a seg-fault rather than a graceful abort if DRV_RESTART_POINTER file does not exist
If the file pointed to by DRV_RESTART_POINTER does not exist, the driver fails with a seg-fault rather than writing a graceful exit about the file not existing.
This is in what will be ctsm5.3.016 with cime6.1.49 and cmeps1.0.32
The full description is here:
https://github.com/ESCOMP/CTSM/issues/2914
The tests that fail are:
ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly ERS_P128x1_Ld765.f10_f10_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp
In the cesm.log file for the first, only the cesm.log file is generated
cesm.log
cat /glade/derecho/scratch/erik/tests_ctsm5316acl/ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly.GC.ctsm5316acl_int/run/case2run/cesm.log.7269007.desched1.241218-154730
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) Read in prof_inparm namelist from: drv_in
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) Using profile_disable= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_timer= 4
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_depth_limit= 12
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_detail_limit= 2
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_barrier= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_outpe_num= 1
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_outpe_stride= 0
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_single_file= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_global_stats= T
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_ovhd_measurement= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_add_detail= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_papi_enable= F
dec2343.hsn.de.hpc.ucar.edu 0: ESMF_Finalize: Error closing trace stream
dec2343.hsn.de.hpc.ucar.edu 0: MPICH ERROR [Rank 0] [job id 2dd16cc6-e949-427e-bb59-48726c16f9fa] [Wed Dec 18 15:47:41 2024] [dec2343] - Abort(1) (rank 0 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 0
dec2343.hsn.de.hpc.ucar.edu 0:
dec2343.hsn.de.hpc.ucar.edu 0: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec2343.hsn.de.hpc.ucar.edu 0: Image PC Routine Line Source
dec2343.hsn.de.hpc.ucar.edu 0: libpthread-2.31.s 000015004133C8C0 Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003F2FBE7E Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003F10A22F Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003D7376A8 MPI_Abort Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049332277 _ZN5ESMCI3VMK5abo Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049330814 _ZN5ESMCI2VM5abor Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 00001500493476E5 c_esmc_vmabort_ Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049B5C7A8 esmf_vmmod_mp_esm Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 00001500499CC1EE esmf_initmod_mp_e Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: cesm.exe 0000000000433ADA MAIN__ 132 esmApp.F90
dec2343.hsn.de.hpc.ucar.edu 0: cesm.exe 00000000004230FD Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libc-2.31.so 000015003C7E129D __libc_start_main Unknown Unknown
drv.log:
cat /glade/derecho/scratch/erik/tests_ctsm5316acl/ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly.GC.ctsm5316acl_int/run/case2run/drv.log.7269007.desched1.241218-154730
read rpointer file = rpointer.cpl.2001-01-18-00000
Looking at the code, there is error handling for this as follows:
cesm/driver/esm_time_mod.F90:
call NUOPC_CompAttributeGet(instance_driver, name='drv_restart_pointer', value=restart_pfile, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
if (trim(restart_pfile) /= 'none') then
if (maintask) then
write(logunit,*) " read rpointer file = "//trim(restart_pfile)
inquire( file=trim(restart_pfile), exist=exists)
if (.not. exists) then
rc = ESMF_FAILURE
call ESMF_LogWrite(trim(subname)//' ERROR rpointer file '//trim(restart_pfile)//' not found', &
ESMF_LOGMSG_ERROR, line=__LINE__, file=__FILE__)
return
endif
So it outputs to the ESMF PET files, but no PET files were created with the case.