MOM5 icon indicating copy to clipboard operation
MOM5 copied to clipboard

Test reproducibility across restarts

Open nichannah opened this issue 9 years ago • 4 comments

There have been reports that the 0.25 degree test case does not have bit reproducibility across restarts. This issue will create a test for this.

nichannah avatar Jul 31 '15 15:07 nichannah

Hi Nic,

I had this problem with 0.25 model, because large restart files were split into parts and were not read properly when the model restarted. I solved this problem by increasing maximum file size in mpp_parameter.F90

yvikhlya avatar Jul 31 '15 15:07 yvikhlya

Hi Yuri, thank you, this is very helpful. Can you elaborate on what you mean by the restart files not being read properly? How do you know this, was there a warning/error? Can you point me to any particular bit of code? If the model is not doing this properly and not throwing an error then it's a bug we should fix.

nichannah avatar Jul 31 '15 20:07 nichannah

I don't have a version which had this problem under my hand unfortunately. As far as I remember, restarts larger than 4Gb were split into several pieces, with one variable in each, as I recall like ocean_density.res.nc ocean_density.res.nc01 ocean_density.res.nc02 ... When model restarted, only first file was read, other variables were bootstrapped. I suspect, it is because of how fms treats these suffixes in file names (did not investigate this actually). The model did not crash. It did printout a warning, but I don't have it now.

yvikhlya avatar Jul 31 '15 21:07 yvikhlya

We may not be seeing the issue because we usually use the layout parameters, which tend to produce much smaller restart files (typically one per-PE), and whose bit-repro probably handled correctly.

This particular problem might be solvable (or at least avoidable) by removing the -Duse_netcdf3 declarative, which would allow large restarts.

But I guess it's a problem if MOM's autmatic file-splitting of 32-bit netCDF files not reproducible?

On Sat, Aug 1, 2015 at 7:09 AM, Yury Vikhliaev [email protected] wrote:

I don't have a version which had this problem under my hand unfortunately. As far as I remember, restarts larger than 4Gb were split into several pieces, with one variable in each, as I recall like ocean_density.res.nc ocean_density.res.nc01 ocean_density.res.nc02 ... When model restarted, only first file was read, other variables were bootstrapped. I suspect, it is because of how fms treats these suffixes in file names (did not investigate this actually). The model did not crash. It did printout a warning, but I don't have it now.

— Reply to this email directly or view it on GitHub https://github.com/BreakawayLabs/mom/issues/114#issuecomment-126812951.

marshallward avatar Jul 31 '15 21:07 marshallward