New unintuitive behavior for RUN_TYPE=branch simulations
With the rpointer updates it's now required to set DRV_RESTART_POINTER when you do a RUN_TYPE=branch simulation in addition to other settings. The default of drv.rpointer is most likely going to be wrong for branching from a case with rpointer files with timestamps on them. Coupled with #524 this means you setup a case and don't get any clear error messaging on what's wrong.
Here's a sample case to replicate what I mean based off of using ctsm5.3.020 which has: cmeps1.0.33, ccs_config_cesm1.0.16 and cime6.1.59 (I'm using an mpi-serial single gridcell case just to make a simpler smaller test that can also be run interactively without going into the queue):
# for cshell (see the use of cshell set below)
cd cime/scripts
# First the control case to branch from:
./create_newcase --res 1x1_brazil --compset I2000Clm60SpRs --machine derecho --case teststartup --mpilib mpi-serial --run-unsupported
cd teststartup
# Turn DEBUG compiling on
./xmlchange DEBUG=TRUE
./case.setup
./case.build
./case.submit --no-batch
# Save the ARCHIVE directory
set DOUT_S_ROOT_BRANCHFROM=`./xmlquery --value DOUT_S_ROOT`
cd ..
# Now the branch case after the first one completes and saves the restart files to the archive directory
./create_clone --clone teststartup --case testbranch --keepexe
cd testbranch
set REFDATE=2000-01-06
set REFTOD=00000
./xmlchange RUN_REFCASE=teststartup,RUN_REFDATE=$REFDATE,RUN_TYPE=branch,RUN_STARTDATE=$REFDATE
./case.setup
# Copy the restart files over to the run directory
set RUNDIR=`./xmlquery --value RUNDIR`
cp $DOUT_S_ROOT_BRANCHFROM/rest/${REFDATE}-${REFTOD}/* $RUNDIR
./case.build
./case.submit --no-batch
It fails at runtime because it can't find the drv.rpointer file, but the error messaging is insufficient as I say above. I have a list of ideas I'll add in the next comment.
Pinging maintainers. I know Jim will want to weigh in on this, but figure he might still be traveling so might not see this for awhile.
@briandobbins @billsacks @jedwards4b @fischer-ncar
Thank you @ekluzek for laying this out clearly. I agree that we should fix this in some way.
A list of ideas I have:
- For RUN_TYPE==branch check for existence of the $DRV_RESTART_POINTER file in the $RUNDIR at preview_namelist time (but after the phase of staging data) abort if it's not found
- For RUN_TYPE==branch set the default of DRV_RESTART_POINTER to drv.rpointer.$RUN_STARTDATE-$RUN_STARTTOD
- Abort in preview_namelists if RUN_TYPE is branch and $RUN_REFDATE is set, but DRV_RESTART_POINTER isn't
- For branch cases have the default of DRV_RESTART_POINTER be UNSET and abort in preview_namelists if UNSET
- Same as 4, but for all cases
- Maybe just always set the default to rpointer.cpl.$RUN_STARTDATE-$RUN_STARTTOD?
I think most of the above should be done all at once. I'd like to hear others about setting it to UNSET. I'm thinking actually that maybe 6 is a nice simple solution to get the naive simple case to work. But, I also would like for there to be some file existence checking in place when the file is going to be used, so also do 1.
I made an error in the setting of RUN_REFDATE, in the original post that I'm just correcting.