Oceananigans.jl
Oceananigans.jl copied to clipboard
Are `WindowedTimeAverage`s working properly when picking up simulations?
I've been using WindowedTimeAverages for my simulations (by setting schedule = AveragedTimeInterval(...) in a NetCDFOutputWriter). I noticed that whenever I run out of walltime and have to checkpoint my simulations, when I pick them up again I get the following warning for each of the time-averaged outputs:
┌ Warning: Returning a WindowedTimeAverage before the collection period is complete.
â”” @ Oceananigans.OutputWriters /glade/work/tomasc/.julia/packages/Oceananigans/3LHMs/src/OutputWriters/windowed_time_average.jl:201
(which comes from this call.)
Does this mean that the time averages aren't being correctly calculated after picking up? I tried following the trail to figure it out but couldn't determine the answer...
I don't think this is done correctly right now. Somehow the Checkpointer needs to know about the simulation for this to work. But right now it only saves model properties.
Ah, I see. Sounds like it wouldn't be trivial to add that support.
I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...
Ah, I see. Sounds like it wouldn't be trivial to add that support.
I guess a workaround to avoid partially-averaged results when picking up would be to set the
Checkpointerto only write checkpoints when theTimeAveragedresults are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...
There are two things. One is to fix the flow of information... that's probably pretty easy because we can either 1) make Checkpointer a callback or 2) change write_output! to have the syntax write_output(writer, simulation) here:
https://github.com/CliMA/Oceananigans.jl/blob/643b484e81e0aeb038b3038266912ad051bce9b8/src/Simulations/run.jl#L147
then with a fallback write_output!(writer, sim) = write_output!(writer, sim.model), very little has to change...
The other task is to figure out how to save down the "state" of the time-averaging apparatus so that it can be restored correctly. That's maybe the harder part but of course unavoidable to make checkpointing work with it.
This might be also relevant for when output files are split? See #3506.
Hmm yes, perhaps the output writers need to be re-initialized when picking up as well? That would require extending what we do when we pick up here:
https://github.com/CliMA/Oceananigans.jl/blob/3bb62a647a55a7dadf5f37331321bf0020a78c4d/src/Simulations/run.jl#L87-L90