IO requires unneeded previous output to restart
Consider the following code block
https://github.com/E3SM-Project/scream/blob/27b21e21088e52cf6febe4a9b1102b8df203f30c/components/eamxx/src/share/io/scream_output_manager.cpp#L240-L256
We are always inside this block when continuing or restarting a run
m_resume_output_file = last_output_filename!="" and not restart_pl.get("force_new_file",false);
if (m_resume_output_file) {
and that's problematic.
When restarting from an arbitrary point, the model assumes and needs the presence of the files from all previous output steps. However, common practice is that the model should have all it needs from .r. and .rhist. files; the model should NEVER open previous output files (except to append them if so desired).
For now, a simple workaround is to force_new_file. Still, reporting here to fix this.
I recall we debated this feature and decided that we wanted to open up an old file and continue to fill it for those cases where we were running many short jobs. We didn't want to store dozens of files that never met there Max Snapshots per File. The use case I am thinking about is in the ne1024 runs where sometimes run for a week at a time but want files to store a full month of data each.
That being said, we may not want this to be the default behavior. We could switch the default for force_new_file to be true and let the user set it to false.
We already should have enough info in the pure output stream definition whether or not we should be filling an already existing file. The max snapshot is there to decide exactly that. Is there a case where the max snapshot keyword coupled with frequency won't tell us everything we need?
The max snapshots value is not enough to decide whether the previous file is full of not. Granted, it may be a corner case, but if the user has already forced a new file before, then you can't know if the last file is full based solely on case/run t0 and max_snapshots. You can infer whether the previous file is full if you know that force_new_file was never used in the previous runs, so that you can compute the current snap number based on case/run t0 and max_snaps.
Edit: the force_new_file feature is not just to start a new file for an existing stream, but also to allow adding new streams in the middle of a simulation. Either way, if force_new_file gets used at any point of the case, the snap arithmetic is no longer possible.
We could simply add something in the rhist file though. E.g., we could store the number of snaps stored (if storage type is defined by max snaps per file), so that we an query rhist to see if there's still room.
For the storage type Monthly and Yearly, we can do something similar assuming we do not allow dt to grow across restarts. I think we always have dt such that dt divides 86400 (a full day), b/c of how ATM_NCPL is defined. However, I think the user may technically change it across restarts. It is then possible to write a restart at 23:00, but then change dt to 2h, so that the next step could be on a different month. I think this is a very weird use case, so I'm ok allowing it. Besides, I think at run time we still check if the next timestamp will fit in the file, so we would still get the correct behavior.
tl;dr: we can achieve what you want by storing in the rhist file whether the last hist file is full or not.
we can achieve what you want by storing in the rhist file whether the last hist file is full or not.
Yes, I strongly think that the model should restart without needing previous output files by default.