virtual_test_bed
virtual_test_bed copied to clipboard
Changes to MOOSE checkpoints prevent MRAD model restarts
Bug Description
The checkpoint = true
in HPMR_thermal_ss.i
used to write a directory called HPMR_dfem_griffin_ss_out_bison0_cp
with checkpoint files. With the recent changes to MOOSE checkpoints, it appears only the master input app can write checkpoint files.
> mpiexec -n 48 ~/sawtooth-projects/dire_wolf/dire_wolf-opt -i HPMR_dfem_griffin_tr.i
*** ERROR ***
No checkpoint file found!
Steps to Reproduce
Obtain the latest version of Dire Wolf and MOOSE. Run the steady state case, HPMR_dfem_griffin_ss.i
as normal. When the simulation finishes, look in HPMR_dfem_griffin_ss_out_bison0_cp
, notice there are 0 files. You will be unable to run the null transient, or any other simulation relying on restarts.
[hartjack][~/projects/virtual_test_bed/microreactors/mrad/steady/HPMR_dfem_griffin_ss_out_bison0_cp] (devel)> l
total 160K
drwx------ 2 hartjack hartjack 0 Mar 15 13:12 .
drwxrwxr-x 7 hartjack hartjack 14K Mar 15 13:51 ..
[hartjack][~/projects/virtual_test_bed/microreactors/mrad/steady/HPMR_dfem_griffin_ss_out_bison0_cp] (devel)>
As a comparison to show new functionality, add checkpoint = true
to [Outputs]
in HPMR_dfem_griffin_ss.i
. Run the steady state case. Look at the two output directories HPMR_dfem_griffin_ss_out_cp
and HPMR_dfem_griffin_ss_out_bison0_cp
. You will see the checkpoints in the former directory, and not the latter.
It is not as trivial as just using the master app checkpoint files. The neutronics and conduction meshs are different, with different BCs. I did not create the mesh, but I'm guessing some changes might need to be made, hence tagging the model creators.
Impact
Trying to use a copy of MRAD for another application. I can't progress much further right now.
Tagging: @miaoyinb @nstauff @GiudGiud @markdehart
@GiudGiud Is this the expected behavior of the new app version? I recently used the INL HPC blue_crab-opt compiled on 3/15/2024 for a different problem and the checkpoint files of the child app are generated as usual.
We should still be able to generate sub-app checkpoints with checkpoint = true
in the subapp.
I need to look into this
Tag @YaqiWang
I believe we've been moving towards having the main app handling all the restart, even if the main input file is "changing" from before-restart to after-restart.
@loganharbour
BTW, @GiudGiud helped me get this running, but obviously this is not the ideal fix. This is [Outputs]
in HPMR_thermo_ss.i
[Outputs]
perf_graph = true
exodus = true
color = true
csv = true
[check]
type = Checkpoint
execute_on = FINAL
num_files = 1e5
[]
[]
This will generate checkpoint files for the SubApp.