MOM6 icon indicating copy to clipboard operation
MOM6 copied to clipboard

EOS Wright `intz_dpa` reproduciblity bug

Open marshallward opened this issue 4 years ago • 3 comments

The computation of intz_dpa in MOM_EOS_Wright.f90 in the block below was not reproducible across layouts in one of my tests.

https://github.com/NOAA-GFDL/MOM6/blob/45b336670988773c050e3d59e3bd0ebe7b948ecb/src/equation_of_state/MOM_EOS_Wright.F90#L512-L534

This was detected in a benchmark-like configuration on 2 MPI ranks by comparing LAYOUT=1,2 and LAYOUT=2,1. (LAYOUT=1,1 matched one of these, but I cannot recall which at the moment.)

CPU: AMD Ryzen 5 2600 Compiler: ifort (IFORT) 19.1.3.304 20200925 Flags: -g -traceback -O3 -mavx -fno-omit-frame-pointer -fprotect-parens

This issue was also observed with -O2 and -msse. The -O0 and -O1 flags gave identical, reproducible results.

Adding additional parentheses restores reproducibility. But additional work will be needed if we want to preserve the original answers, either with an "answers 2021" flag or a magical choice of parentheses.

marshallward avatar Feb 23 '21 17:02 marshallward

We have tried to ensure that there are parentheses enforcing the order of all sums of 3 or more quantities in most of the MOM6 code, but we have been less thorough with products of 3 or more quantities. In this case, is it be enough to add parentheses around the sums to recover the reproducibility, or do we also need to worry about the order of the products?

(If it is the products whose order needs to be specified, it is worth noting that this does not apply to factors that are integer powers of 2.)

Hallberg-NOAA avatar Feb 24 '21 13:02 Hallberg-NOAA

Parentheses were sufficient to produce identical answers in 1x2 and 2x1. I didn't need to reorder any elements. (Well, not explicitly; obviously the compiler has its own opinions on the matter!)

I didn't add parentheses to any power-of-two multiplication, so that is fine.

Here are the ambiguous lines

Triple add: (c1 + ... + ...) (I actually missed this one but apparently does not affect the answer)

lambda_2d(i,j) = (c0 +c4*S(i,j)) + T(i,j) * (c1 + T(i,j)*((c2 + c3*T(i,j))) + c5*S(i,j))

Another triple addition:

-    I_Lzz = 1.0 / (p0 + (lambda * I_al0) + p_ave)
+    I_Lzz = 1.0 / (((p0 + (lambda * I_al0))) + p_ave)

Triple multiply:

-    eps = 0.5*GxRho*dz*I_Lzz
+    eps = 0.5 * GxRho * (dz * I_Lzz)

A quadruple multiply:

-    rem = I_Rho * (lambda * I_al0**2) * eps2 * &
-          (C1_3 + eps2*(0.2 + eps2*(C1_7 + C1_9*eps2)))
+    rem = I_Rho * ((lambda * I_al0**2) * ( &
+        eps2 * (C1_3 + eps2 * (0.2 + eps2 * (C1_7 + C1_9 * eps2))) &
+    ))

Another triple multiply

-    dpa(i,j) = Pa_to_RL2_T2 * (g_Earth*rho_anom*dz - 2.0*eps*rem)
+    dpa(i,j) = Pa_to_RL2_T2 * (g_Earth * (rho_anom * dz) - 2.0 * eps * rem)

marshallward avatar Feb 24 '21 16:02 marshallward

hanks for that update. The fact that we have to worry about triple and quadruple multiplies probably means that dealing with this throughout the code is going to be a lot amount of work (even if the answers can only ever change at roundoff). As we deal with this, it may be worth noting (as you do) that most of the unit conversion factors (like Pe_to_RL2_T2 in your last example) integer powers of 2, so their order does not matter.

For addition or subtraction of three or more values of ambiguous sign, I think that we are unlikely to have a large number instances of an unspecified order remaining in the MOM6 code, because the small differences of large numbers can have such a strong dependence on the order of the sums.

Hallberg-NOAA avatar Feb 24 '21 17:02 Hallberg-NOAA

This issue was addressed by the addition of the new WRIGHT_FULL and WRIGHT_REDUCED equations of state, which are careful to add parentheses specifying the order of sums and products. The original WRIGHT equation of state was retained to preserve existing answers and hence could not be modified. The new code was introduced via https://github.com/NOAA-GFDL/MOM6/pull/331, which was merged onto the main branch of MOM6 on July 31, 2023. No further actions are needed to address this issue, which can now be closed.

Hallberg-NOAA avatar Mar 21 '24 08:03 Hallberg-NOAA

The original issue is that layout reproducibility was lost with aggressive optimization. Has this been tested or confirmed yet?

marshallward avatar Mar 21 '24 13:03 marshallward