scream icon indicating copy to clipboard operation
scream copied to clipboard

"custom" restart leads to init time doubling (reading restart files tenfolding)

Open mahf708 opened this issue 1 year ago • 2 comments

Usually, a restart happens from within the same run dir the original run was happening. However, consider an alternative setup whereby we create a new run dir, move all needed files there, and then continue the run there. One would expect both to be identical in terms of perf, but that's not the case!

In the second ("custom") case, the atm init time doubles. A more detailed reading (thanks to @ndkeen) shows that the time reading of restart files (take the main scream restart file as an example) actually increasing by tenfold --- an order of magnitude.

I am filing this issue and setting it as "bug" (as it doesn't match expectation). The binary netcdf files were sent to hpss archives and recalled back, but I am inclined to think that's not going to change anything. Additionally, I am filing this issue because I think it is related to my other IO issues, so maybe it will help us narrow our search for the elusive bug...

xref #2892 #2891 #2890 #2889

mahf708 avatar Jul 10 '24 03:07 mahf708

Does the atm.log file show that you are reading the file in the new run dir? Also, just to avoid the obvious, is the new run dir on the same filesystem as the original one?

bartgol avatar Jul 25 '24 00:07 bartgol

Does the atm.log file show that you are reading the file in the new run dir? Also, just to avoid the obvious, is the new run dir on the same filesystem as the original one?

Yes and yes. This is actually my automated way to recover the missing files reported in #2890, and it worked.

mahf708 avatar Jul 25 '24 01:07 mahf708

I am going to close this as non-important, wontfix for now. If it comes up later, we can deal with it. It's a minor nuisance anyway.

mahf708 avatar Aug 07 '24 21:08 mahf708