mizuRoute icon indicating copy to clipboard operation
mizuRoute copied to clipboard

Exact restart tests failing because of an rpointer file issue

Open ekluzek opened this issue 9 months ago • 5 comments

There seems to be problems on Derecho with the Exact Restart tests (ER*) because of a rpointer file issue.

For example these two tests fail:

ERI.nldas2_nldas2_rHDMA_mnldas2.I2000Clm60SpMizGs.derecho_intel.mizuroute-default ( RUN) ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default ( RUN)

On looking at the cesm.log file, it's a problem with reading in the rof.rpointer file.

rof.log:

---- Read river network data --- 
  Reading hruArea into structure HRU
  Reading hruID into structure HRU2SEG
  Reading hru_seg into structure HRU2SEG
  Reading rlen into structure SEG
  Reading rslp into structure SEG
  Reading segID into structure NTOPO
  Reading toSegID into structure NTOPO
 Reading restart pointer file....
 (OPNFIL): Successfully opened file ./rpointer.rof on unit=         -132

cesm.log:

dec1217.hsn.de.hpc.ucar.edu 128:  ROF: PIO numiotasks=          14
dec1217.hsn.de.hpc.ucar.edu 128:  ROF: PIO stride=         128
dec1217.hsn.de.hpc.ucar.edu 128:  ROF: PIO rearranger=           2
dec1217.hsn.de.hpc.ucar.edu 128:  ROF: PIO root=           1
dec1225.hsn.de.hpc.ucar.edu 394: forrtl: severe (24): end-of-file during read, unit -129, file /glade/derecho/scratch/erik/tests_mizuccpln2v22ctsm5231m/ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default.GC.mizuccpln2v22ctsm5231m_int/run/./rpointer.rof
dec1225.hsn.de.hpc.ucar.edu 394: Image              PC                Routine            Line        Source
dec2323.hsn.de.hpc.ucar.edu 1755: forrtl: severe (24): end-of-file during read, unit -129, file /glade/derecho/scratch/erik/tests_mizuccpln2v22ctsm5231m/ERS.f09_f09_mg17.I2000Clm60SpMizGs.derecho_intel.mizuroute-default.GC.mizuccpln2v22ctsm5231m_int/run/./rpointer.rof
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe           00000000014A0056  Unknown               Unknown  Unknown
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe           00000000010F122A  rtmmod_mp_restfil         844  RtmMod.F90
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe           00000000010EE975  rtmmod_mp_route_i         776  RtmMod.F90
dec2323.hsn.de.hpc.ucar.edu 1755: Image              PC                Routine            Line        Source
dec1225.hsn.de.hpc.ucar.edu 394: cesm.exe           00000000010DB217  rof_comp_nuopc_mp         534  rof_comp_nuopc.F90

ekluzek avatar Mar 18 '25 16:03 ekluzek

The problem is that it's reading in rpointer.rof (which is an empty file), rather than the rpointer.rof.2000-01-06-00000 with the datestamp in it.

ekluzek avatar Mar 18 '25 16:03 ekluzek

This line is rpointer file read. But, actually this has not been tested so far. so may need to look into why

nmizukami avatar Mar 18 '25 17:03 nmizukami

One thing I see is that there is a call to io_rpfile that needs curDateTime sent into it.

https://github.com/ESCOMP/mizuRoute/blob/cesm-coupling/route/build/src/write_simoutput_pio.f90#L427

that's all I see that needs to be done so far.

The io_rpfile subroutine could also have a check so that curDateTime MUST be sent in when runmode is cesm-coupling. That's probably worth doing since it is called so infrequently. There might as well be more error checking for it.

ekluzek avatar Mar 18 '25 22:03 ekluzek

This line is rpointer file read. But, actually this has not been tested so far. so may need to look into why

Yeah I think we should add more messaging around the read. So that when it tries to find the file with the timestamp it says it failed and it's going to try without the timestamp. And then it should have graceful fail I'd that file doesn't exist either.

ekluzek avatar Mar 18 '25 22:03 ekluzek

OK, I got it working. The core changes are this:

diff --git a/route/build/src/write_restart_pio.f90 b/route/build/src/write_restart_pio.f90
index 45a244fb..2727e8a1 100644
--- a/route/build/src/write_restart_pio.f90
+++ b/route/build/src/write_restart_pio.f90
@@ -191,7 +191,7 @@ SUBROUTINE restart_output(ierr, message)
   if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
 
   if (trim(runMode)=='cesm-coupling') then
-    call io_rpfile('w', ierr, cmessage, curDatetime=simDatetime(1))
+    call io_rpfile('w', ierr, cmessage, curDatetime=simDatetime(2))
   else
     call io_rpfile('w', ierr, cmessage)
   end if
diff --git a/route/build/src/write_simoutput_pio.f90 b/route/build/src/write_simoutput_pio.f90
index 44ba033c..4c7a6b63 100644
--- a/route/build/src/write_simoutput_pio.f90
+++ b/route/build/src/write_simoutput_pio.f90
@@ -407,6 +407,7 @@ SUBROUTINE close_all(ierr, message)
  ! *********************************************************************
  SUBROUTINE init_histFile(ierr, message)
 
+   USE globalData,  ONLY: simDatetime       ! previous and current model time
    USE public_var,  ONLY: outputAtGage      ! ascii containing last restart and history files
 
    implicit none
@@ -419,7 +420,7 @@ SUBROUTINE init_histFile(ierr, message)
    ierr=0; message='init_histFile/'
 
    ! get history file names to append and assign it to hfileout
-   call io_rpfile('r', ierr, cmessage)
+   call io_rpfile('r', ierr, cmessage, curDatetime=simDatetime(1))
    if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
 

ekluzek avatar Mar 19 '25 17:03 ekluzek

@nmizukami it looks like this is the solution

ekluzek avatar Jul 30 '25 21:07 ekluzek

Hi Erik, Yes, these change resolve rpointer issue.

nmizukami avatar Jul 31 '25 11:07 nmizukami