Exact restart tests failing because of an rpointer file issue
There seems to be problems on Derecho with the Exact Restart tests (ER*) because of a rpointer file issue.
For example these two tests fail:
ERI.nldas2_nldas2_rHDMA_mnldas2.I2000Clm60SpMizGs.derecho_intel.mizuroute-default ( RUN) ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default ( RUN)
On looking at the cesm.log file, it's a problem with reading in the rof.rpointer file.
rof.log:
---- Read river network data ---
Reading hruArea into structure HRU
Reading hruID into structure HRU2SEG
Reading hru_seg into structure HRU2SEG
Reading rlen into structure SEG
Reading rslp into structure SEG
Reading segID into structure NTOPO
Reading toSegID into structure NTOPO
Reading restart pointer file....
(OPNFIL): Successfully opened file ./rpointer.rof on unit= -132
cesm.log:
dec1217.hsn.de.hpc.ucar.edu 128: ROF: PIO numiotasks= 14
dec1217.hsn.de.hpc.ucar.edu 128: ROF: PIO stride= 128
dec1217.hsn.de.hpc.ucar.edu 128: ROF: PIO rearranger= 2
dec1217.hsn.de.hpc.ucar.edu 128: ROF: PIO root= 1
dec1225.hsn.de.hpc.ucar.edu 394: forrtl: severe (24): end-of-file during read, unit -129, file /glade/derecho/scratch/erik/tests_mizuccpln2v22ctsm5231m/ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default.GC.mizuccpln2v22ctsm5231m_int/run/./rpointer.rof
dec1225.hsn.de.hpc.ucar.edu 394: Image PC Routine Line Source
dec2323.hsn.de.hpc.ucar.edu 1755: forrtl: severe (24): end-of-file during read, unit -129, file /glade/derecho/scratch/erik/tests_mizuccpln2v22ctsm5231m/ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default.GC.mizuccpln2v22ctsm5231m_int/run/./rpointer.rof
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe 00000000014A0056 Unknown Unknown Unknown
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe 00000000010F122A rtmmod_mp_restfil 844 RtmMod.F90
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe 00000000010EE975 rtmmod_mp_route_i 776 RtmMod.F90
dec2323.hsn.de.hpc.ucar.edu 1755: Image PC Routine Line Source
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe 00000000010DB217 rof_comp_nuopc_mp 534 rof_comp_nuopc.F90
The problem is that it's reading in rpointer.rof (which is an empty file), rather than the rpointer.rof.2000-01-06-00000 with the datestamp in it.
This line is rpointer file read. But, actually this has not been tested so far. so may need to look into why
One thing I see is that there is a call to io_rpfile that needs curDateTime sent into it.
https://github.com/ESCOMP/mizuRoute/blob/cesm-coupling/route/build/src/write_simoutput_pio.f90#L427
that's all I see that needs to be done so far.
The io_rpfile subroutine could also have a check so that curDateTime MUST be sent in when runmode is cesm-coupling. That's probably worth doing since it is called so infrequently. There might as well be more error checking for it.
This line is rpointer file read. But, actually this has not been tested so far. so may need to look into why
Yeah I think we should add more messaging around the read. So that when it tries to find the file with the timestamp it says it failed and it's going to try without the timestamp. And then it should have graceful fail I'd that file doesn't exist either.
OK, I got it working. The core changes are this:
diff --git a/route/build/src/write_restart_pio.f90 b/route/build/src/write_restart_pio.f90
index 45a244fb..2727e8a1 100644
--- a/route/build/src/write_restart_pio.f90
+++ b/route/build/src/write_restart_pio.f90
@@ -191,7 +191,7 @@ SUBROUTINE restart_output(ierr, message)
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
if (trim(runMode)=='cesm-coupling') then
- call io_rpfile('w', ierr, cmessage, curDatetime=simDatetime(1))
+ call io_rpfile('w', ierr, cmessage, curDatetime=simDatetime(2))
else
call io_rpfile('w', ierr, cmessage)
end if
diff --git a/route/build/src/write_simoutput_pio.f90 b/route/build/src/write_simoutput_pio.f90
index 44ba033c..4c7a6b63 100644
--- a/route/build/src/write_simoutput_pio.f90
+++ b/route/build/src/write_simoutput_pio.f90
@@ -407,6 +407,7 @@ SUBROUTINE close_all(ierr, message)
! *********************************************************************
SUBROUTINE init_histFile(ierr, message)
+ USE globalData, ONLY: simDatetime ! previous and current model time
USE public_var, ONLY: outputAtGage ! ascii containing last restart and history files
implicit none
@@ -419,7 +420,7 @@ SUBROUTINE init_histFile(ierr, message)
ierr=0; message='init_histFile/'
! get history file names to append and assign it to hfileout
- call io_rpfile('r', ierr, cmessage)
+ call io_rpfile('r', ierr, cmessage, curDatetime=simDatetime(1))
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
@nmizukami it looks like this is the solution
Hi Erik, Yes, these change resolve rpointer issue.