Potential enhancement to Mask interface to help avoid FPEs
As I'm cleaning up FPEs in SHOC, I'm encountering this pattern over and over (this pattern is widespread in P3 too):
Spack pack1(data1), pack2(data2);
pack2.set(pack1 != 0, val / pack1);
... which needs to be changed in over to avoid the FPE (if they are on) to:
Spack pack1(data1), pack2(data2);
const auto mask = pack1 != 0;
if (mask.any()) {
pack2.set(mask, val / pack1);
}
Because, even if pack2 is protected from getting infs by the mask, the divide-by-zero still occurs when val/pack1 gets evaluated. The code that needed to protect against this makes the code uglier and may impact performance since it introduces another branch.
I was thinking a macro like this would be nice:
#define FPE_SAFE_SET(pack, mask, val)
#if defined(SCREAM_FPE)
if (mask.any()) {
pack.set(mask, val);
}
#else
pack.set(mask, val);
#endif
Which would be used like this:
Spack pack1(data1), pack2(data2);
FPE_SAFE_SET(pack2, pack1 != 0, val / pack1);
Thoughts?
Jim, would you point to an example of this pattern in the code? Thanks.
@ambrad , sure. Here's a change I had to make in my dev branch:
--- a/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
+++ b/components/scream/src/physics/shoc/atmosphere_macrophysics.hpp
@@ -298,10 +298,12 @@ public:
inv_qc_relvar(i,k) = 1;
const auto condition = (qc(i,k) != 0 && qc2(i,k) != 0);
- inv_qc_relvar(i,k).set(condition,
- ekat::min(inv_qc_relvar_max,
- ekat::max(inv_qc_relvar_min,
- ekat::square(qc(i,k))/qc2(i,k))));
+ if (condition.any()) {
+ inv_qc_relvar(i,k).set(condition,
+ ekat::min(inv_qc_relvar_max,
+ ekat::max(inv_qc_relvar_min,
+ ekat::square(qc(i,k))/qc2(i,k))));
+ }
This was one of a dozen or so similar changes I needed to make.
Jim, would you point me to your branch? I'm curious if all of the cases involve clipping with min and max.
Here's a crazy idea, which you guys might deem too "bulky": we could wrap binary ops results into a "pack expression", which is only evaluated when it's accessed. Something along the line of:
// Does not eval the expression.
PackExpression operator / (const Pack& p1, const Pack& p2);
// Only evaluates the expression if m[i] is true.
Pack& Pack::set(const Mask& m, const PackExpression& expr);
// Evaluates the expression everywhere
Pack& Pack::operator=(const PackExpression& expr);
The struct PackExpression can simply hold references to the args.
I think Sacado in Trilinos had a similar issue, which is due to expressions in function calls being evaluated before passing control to the fcn.
@ambrad , I am about to make a PR; my branch is jgfouca/fix_fpe_testing
One of my concerns here is it's not clear to me the original F90 is quite what we want. Then again, I might just be confused. Here is the original F90 corresponding to the C++ above:
relvar(:,:) = 1.0_r8
relvarmax = 10.0_r8
where (rcm(:ncol,:pver) /= 0.0 .and. rcm2(:ncol,:pver) /= 0.0) &
relvar(:ncol,:pver) = min(relvarmax,max(0.001_r8,rcm(:ncol,:pver)**2.0/rcm2(:ncol,:pver)))
For simplicity, assume these are all scalars, not arrays. Suppose rcm2 = 0 and rcm = 1. Then rcm/rcm2 = +Inf. In that case, one would think relvar should be set to relvarmax = 10. But, instead, the above code makes relvar = 1. Yet if rcm2 were just a bit above 0, say rcm2 = 1e-6, the above code would indeed make it 10. So that makes the overall function discontinuous. A similar situation occurs when rcm = 0 but rcm2 != 0. relvar is set to 1 but if rcm were just a bit above 0, it would instead be 0.001.
Someone please correct me if I'm just confused.
I'd like to look at the other cases to see if similar strange things are happening.
@ambrad what you say seems right to me, but it has to do with the algorithm, rather than the impl. In general, the question is how to deal with "compute x=f(a,b), then set y=x only where condition(a,b) is satisfied". It seems to me that the two issues are related, but could be tackled independently. In the example above, e.g., the solution might be to init relvar=relvarmax (which is the value to be used when handling the FPE case).
@ambrad , i think maybe i picked one that was more complicated than necessary. Most of the cases are very simple where you have a zero if check in fortran and a != mask on the C++ side.
Here's a simpler one:
C++:
@@ -207,10 +211,13 @@ void Functions<S,D>::shoc_assumed_pdf(
Spack r_qwthl_1(0);
{
const Spack testvar = a*sqrtqw2_1*sqrtthl2_1 + (1 - a)*sqrtqw2_2*sqrtthl2_2;
- r_qwthl_1.set(testvar != 0,
- ekat::max(-1,
- ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
- - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+ const auto testvar_ne_zero = testvar != 0;
+ if (testvar_ne_zero.any()) {
+ r_qwthl_1.set(testvar_ne_zero,
+ ekat::max(-1,
+ ekat::min(1, (qwthlsec - a*(qw1_1 - qw_first)*(thl1_1 - thl_first)
+ - (1 - a)*(qw1_2 - qw_first)*(thl1_2 - thl_first))/testvar)));
+ }
}
F90:
testvar=(a*sqrtqw2_1*sqrtthl2_1+(1._rtype-a)*sqrtqw2_2*sqrtthl2_2)
if (testvar .eq. 0._rtype) then
r_qwthl_1=0._rtype
else
r_qwthl_1=max(-1.0_rtype,min(1.0_rtype,(qwthlsec-a*(qw1_1-qw_first) &
*(thl1_1-thl_first)-(1._rtype-a)*(qw1_2-qw_first) &
*(thl1_2-thl_first))/testvar))
endif
This case seems to have the same issue as above. If testvar = 0, then r_qwthl_1 = 0. But if it deviates from 0 by a tiny bit, then r_qwthl_1 = 1 or -1. So again this divide-by-0 situation arises in a context where something bad is happening.
I was thinking a macro like this would be nice:
#define FPE_SAFE_SET(pack, mask, val) #if defined(SCREAM_FPE) if (mask.any()) { pack.set(mask, val); } #else pack.set(mask, val); #endif
Wait, doesn't this evaluate val anyways? That is, the FPE is still generated, even if mask.any()==false...
@bartgol I'm not convinced yet that code that properly implements the evaluation of a continuous function will necessarily require delayed evaluation to look cleaner.
Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.
I think the issue of the continuous fcn is separate. Meaning, even if you init relvar=relvarmax int he 1st example (hence, it's continuous w.r.t rcm2), you still have the FPE generated by evaluating rcm/rcm2 when calling pack.set(m,rcm/rcm2) (the f90 code does not have this issue, since the division is only performed for those rcm2 that are nonzero).
I agree that the fact that relvar is not continuous w.r.t. rcm2 might be an issue, but I think that has nothing to do with FPEs.
Re: the question you just asked, the issue here is when pack size = 1 in an FPE build, so the macro above does accomplish what is intended.
Ah, yes, you are right, for pack size 1 it would be fine.
Luca, I don't disagree and have no opinion re: Jim's macro, your suggestions, etc. Obviously please feel free to introduce those additional mechanisms. My point is only that in the course of implementing a continuous function with clipping, it's likely the .any(), etc., usage will end up leading to no divide-by-0 in the packsize=1 case relevant to the FPE builds, obviating any new pack mechanism.