ENERGY, keep open: mass, energy diagnostics
This is to discuss the current impl of mass and energy diagnostics with possible changes.
I see a couple of issues with the current implementation:
- it is column-based, not globally integrated
- it requires implementations in each package's interface level
In no way this is a good approach to do this, but debug statements in branch https://github.com/E3SM-Project/scream/compare/oksanaguba/scream/mass-debug use FM("Physics") fields to access necessary fields and compute diagnostics, indep. of whether the package has such implementations. I think it also will allow to use field_sum() to get global integrals. The current impl. can be modified to be used in global sums, but I do not see why we would want to do it this way.
Instead, I would propose to re-design mass/energy checks to move them out of packages and into AD, with using FM fields (i can see that if FM is not used directly, a package could potentially "hide" its leaks).
A minor point is that i would "shrink" the current mass/energy diagnostics message, too.
@oksanaguba should we support both column and global integrals? It seems to me we should.
yes, we can have messages about columns and glob integrals, too.
I thought column-based was enough, since physics treats columns independently of each other.
Luca, looking at the columns output it is hard to judge if there is a big leak or not. We also eventually want to have diagnostics comparing delta(mass) and fluxes per whole physics step, at least i think it would be convenient.
I see. That makes sense.
Tagging @tcclevenger , @AaronDonahue explicitly in case they want to voice their opinion (and anyone else, please).
If there are no objection, i will work on this new infrastructure with everyone's help.
Random thought - it seems to me that Conrad's check didn't alert us to our water leak because it just tells us the per-step conservation violation. I had thought "well, conservation isn't great but it probably all averages out" when instead the violation seems to be of the same sign most of the time, causing a leak rather than just noise. I don't think there's any sensible way to automate the checking for non-conservation of a certain sign over lots of steps, but making sure we print out the sign of the non-conservation on each step would allow humans to get a sense of whether the non-conservation is zero-mean or not. Though once we have good conservation on the timestep level it won't really matter whether errors are all one sign or not...
Another option could be to accumulate leaks on each column over time, and perhaps start printing warnings when some values in that view become large. It would require an extra ncols-sized view, and an extra (rank-local) reduction at each time step, but might give more insight of how much the leaks are accumulating over time.