Checkpointer not supported for coupled ClimaOcean workflows
I am trying to implement a checkpointer in the one_degree_simulation.jl to be able to run longer simulations. My implementation is:
output_dir = "/g/data/v46/txs156/ClimaOcean.jl/examples/"
prefix = "one_deg_tripolar_checkpoint"
ocean.output_writers[:checkpoint] = Checkpointer(ocean.model;
schedule = TimeInterval(1days),
prefix = prefix,
cleanup = true,
dir = output_dir,
verbose = true,
overwrite_existing = true)
# We check if a checkpointer already exists - if not, we can run the initial start up
pattern = prefix * "*"
checkpoint_file = glob(pattern, output_dir)
if !isempty(checkpoint_file)
# If checkpoint exists, load the simulation state
println("Checkpoint found, resuming the simulation from the checkpoint.")
simulation.Δt = 20minutes
simulation.stop_time = 360days
run!(simulation, pickup=true)
else
print("Checkpoint not found, spinning up simulation from scratch.")
run!(simulation)
simulation.Δt = 20minutes
simulation.stop_time = 360days
run!(simulation)
end
However, the run!(simulation, pickup=true) line does not work, giving the error
ERROR: LoadError: No checkpointers found: cannot pickup simulation!
Even though a checkpointer with the name one_deg_tripolar_checkpoint_iteration1656.jld2 was saved successfully in the output_dir (which is the same as "." in this case). The docs suggest the pickup=true line should look in the directory for checkpoints, but it doesn't appear to be.
Hmm right, I don't think pickup=true works with ClimaOcean yet. What you can do now is manually restore the state. We have to think about how this should work. Somehow when we are picking up, the coupled model needs to know to look for checkpoints for all of its components?
I think we need to design a Checkpointer for the coupled simulation which checkpoints all component models at the same time.
Until then one can use a Checkpointer just for one component (like the ocean) in this way; for this you need to use JLD2 to restore from a checkpoint by opening the checkpoint file and loading the data by hand. I can prototype this workflow and come up with some sample code (if you figure it out @taimoorsohail please post here!)
Thanks!
noting that this seems similar/duplicate to #303