Update rotation computation
I was working on some simulations using relatively stiff rods and was having stability issues. I tracked it down to how the rotation is computed. Numerical error can cause the trace of the directors to be slightly larger than the max value of 3, which then yields a NaN when arccos is called.
The previous solution was to subtract 1e-10 to account for this occasional numerical error, but for a stiff rod, this adjustment was not always enough. Clipping the trace to -3 and 3, which corresponds to the allowable range of arccos yielded large improvements in stability with minimal performance penalty.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
93441ed) 87.78% compared to head (8ac9c63) 87.78%. Report is 218 commits behind head on update-0.3.2.
:exclamation: Current head 8ac9c63 differs from pull request most recent head f6a761b. Consider uploading reports for the commit f6a761b to get more accurate results
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## update-0.3.2 #319 +/- ##
=============================================
Coverage 87.78% 87.78%
=============================================
Files 43 43
Lines 2938 2940 +2
Branches 341 341
=============================================
+ Hits 2579 2581 +2
Misses 335 335
Partials 24 24
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
What do you mean by instability? This is clipping the trace of the directors product. If the directors are sufficiently orthonormal, then this is just a floating point issue for vector representation of small angles. This issue will only occur when neighboring directors are very close to each other (Tr(R) ~= 3 means no rotation), so I don't see it as stability issue per se.
I mean instable as when position and directors start to explode and become not orthogonal and misaligned. Getting an error from this function was one of the early indicator, but now it will just pass with clipped trace.
What we wanted is something like
if trace < 3.0 + threshold:
theta = arccos(0.5 * trace - 0.5)
else:
raise Error
which became the current form of implementation theta = arccos(0.5 * trace - 0.5 - threshold) due to numba optimization. We want to allow threshold amount of band to accommodate for upstream numerical accuracy, but I'm not sure allowing all theta > 3 values to pass is a good idea.
How about either
- Use min/max implementation as above but add exception/warning message if trace deviates
thresholdamount. I think nownumbadoes have some exception handling capability with python. - Keep previous implementation but make
thresholdconfigurable by user. - [Little far-fetched] Add re-orthogonalization step on directors before stepping, so we can make sure we don't have numerical issue in this function.
A few thoughts.
- Do the directors become non-orthonormal during instability? I usually pay attention to the positions to determine stability/blowup.
- The simulation does not stop when this returns a NaN, it just continues computing on NaNs, so why does it matter if instability is "caught" here? If the simulation is going to blow up, it will do that regardless, while sometimes it might be able to recover (though again, I am doubtful that trace > 3.0 is related to anything other than round off accumulation). Non-orthonormal vectors do not necessarily have a trace > 3, so this doesn't really test for orthonormality.
- The current approach introduces error at every step. Trace = 3.0 means no rotation, so at rest, the rod will always be trying to rotate slightly proportional to the threshold amount.
- I have verified that the cases this was causing problems for me appear to be round off accumulation. For example, two neighboring elements with the normals [-3.69221646e-05 9.99999999e-01 0.00000000e+00] and [-3.93231655e-05 9.99999999e-01 0.00000000e+00] will fail at the current threshold.
- When I reran the same simulation that gave me these directors using this PR, I stayed below the current threshold (1e-10) for all cases (in fact, it stayed below 1e-11). My interpretation is that since all rotations are no longer being biased by the threshold amount, the round-off error does not accumulate to the same degree, so clipping may in fact be more stable and accurate.
A few thoughts.
- Do the directors become non-orthonormal during instability? I usually pay attention to the positions to determine stability/blowup.
- The simulation does not stop when this returns a NaN, it just continues computing on NaNs, so why does it matter if instability is "caught" here? If the simulation is going to blow up, it will do that regardless, while sometimes it might be able to recover (though again, I am doubtful that trace > 3.0 is related to anything other than round off accumulation). Non-orthonormal vectors do not necessarily have a trace > 3, so this doesn't really test for orthonormality.
- The current approach introduces error at every step. Trace = 3.0 means no rotation, so at rest, the rod will always be trying to rotate slightly proportional to the threshold amount.
- I have verified that the cases this was causing problems for me appear to be round off accumulation. For example, two neighboring elements with the normals [-3.69221646e-05 9.99999999e-01 0.00000000e+00] and [-3.93231655e-05 9.99999999e-01 0.00000000e+00] will fail at the current threshold.
- When I reran the same simulation that gave me these directors using this PR, I stayed below the current threshold (1e-10) for all cases (in fact, it stayed below 1e-11). My interpretation is that since all rotations are no longer being biased by the threshold amount, the round-off error does not accumulate to the same degree, so clipping may in fact be more stable and accurate.
I agree with most of your point. 3), I checked some time ago that the threshold amount of error we are adding here is negligible compare to other numerical issues upstream (especially multiplying moduli to shear), but I think it is case dependent.
I think we are somewhat on the same page: in most of the case when simulation is running properly, this corner case shouldn't be an issue. I am just having a perspective that the behavior of a code block, especially if it is a free-function, should only depend on its functionality. Basically, I think producing NaN is more expected behavior, especially when the input directors are some instable directors. In this regard, I agree with your changes, just we should clearly state and document this behavior inside the function, and maybe add warning message if the trace is significantly deviated outside (-1,3) bound (I think this will benefit many other cases to find numerical issues). If you agree, I can make a change and commit. I can also write some test for the cases, and address the threshold inside the sin function.
@skim0119 and @armantekinalp what should be the follow up on this one?