Oceananigans.jl Are `WindowedTimeAverage`s working properly when picking up simulations?

I've been using WindowedTimeAverages for my simulations (by setting schedule = AveragedTimeInterval(...) in a NetCDFOutputWriter). I noticed that whenever I run out of walltime and have to checkpoint my simulations, when I pick them up again I get the following warning for each of the time-averaged outputs:

┌ Warning: Returning a WindowedTimeAverage before the collection period is complete.
└ @ Oceananigans.OutputWriters /glade/work/tomasc/.julia/packages/Oceananigans/3LHMs/src/OutputWriters/windowed_time_average.jl:201

(which comes from this call.)

Does this mean that the time averages aren't being correctly calculated after picking up? I tried following the trail to figure it out but couldn't determine the answer...

Feb 27 '24 18:02 tomchor

I don't think this is done correctly right now. Somehow the Checkpointer needs to know about the simulation for this to work. But right now it only saves model properties.

Feb 27 '24 19:02 glwagner

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

Feb 28 '24 15:02 tomchor

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

There are two things. One is to fix the flow of information... that's probably pretty easy because we can either 1) make Checkpointer a callback or 2) change write_output! to have the syntax write_output(writer, simulation) here:

https://github.com/CliMA/Oceananigans.jl/blob/643b484e81e0aeb038b3038266912ad051bce9b8/src/Simulations/run.jl#L147

then with a fallback write_output!(writer, sim) = write_output!(writer, sim.model), very little has to change...

The other task is to figure out how to save down the "state" of the time-averaging apparatus so that it can be restored correctly. That's maybe the harder part but of course unavoidable to make checkpointing work with it.

Feb 28 '24 15:02 glwagner

This might be also relevant for when output files are split? See #3506.

Mar 14 '24 07:03 navidcy

Hmm yes, perhaps the output writers need to be re-initialized when picking up as well? That would require extending what we do when we pick up here:

https://github.com/CliMA/Oceananigans.jl/blob/3bb62a647a55a7dadf5f37331321bf0020a78c4d/src/Simulations/run.jl#L87-L90

Mar 14 '24 14:03 glwagner

Oceananigans.jl Oceananigans.jl copied to clipboard

Are `WindowedTimeAverage`s working properly when picking up simulations?

Oceananigans.jl
Oceananigans.jl copied to clipboard