jsbsim icon indicating copy to clipboard operation
jsbsim copied to clipboard

Fix an uninitialized variable in GeographicLib.

Open bcoconni opened this issue 1 year ago • 3 comments

Fixes the issue #475.

~~Unlike the option that has been chosen by the GeographicLib project (see https://github.com/geographiclib/geographiclib/issues/33#issuecomment-2408610762), the variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.~~

A new floating point exception flag FE_OVERFLOW has also been added to JSBSim.cpp because it is the exception that is triggered by this bug and JSBSim.cpp was not catching it on Linux. This setting helped debugging the issue while running the script c3101.xml on an Ubuntu distribution.

UPDATE: The variable m12x is initialized to Math::NaN as in the GeographicLib project. See the discussion below for details.

bcoconni avatar Oct 13 '24 10:10 bcoconni

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 24.96%. Comparing base (0f00b80) to head (f19f079). Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1173      +/-   ##
==========================================
- Coverage   25.12%   24.96%   -0.17%     
==========================================
  Files         170      170              
  Lines       18337    19283     +946     
==========================================
+ Hits         4608     4814     +206     
- Misses      13729    14469     +740     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 13 '24 10:10 codecov[bot]

@bcoconni I'm refreshing my memory regarding the original issue and trying to understand the issue with setting m12x = Math::NaN.

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

https://github.com/JSBSim-Team/jsbsim/blob/0f00b806a74027d71a3e758bbb1e177e4b6130f2/python/fpectl/fpectlmodule.cpp#L142-L148

https://github.com/JSBSim-Team/jsbsim/blob/0f00b806a74027d71a3e758bbb1e177e4b6130f2/src/JSBSim.cpp#L288-L294

Testing a number of scripts I hit a breakpoint here:

https://github.com/JSBSim-Team/jsbsim/blob/0f00b806a74027d71a3e758bbb1e177e4b6130f2/src/GeographicLib/Geodesic.cpp#L406

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

I then initialized m12x to Math:NaN() and checked what happened when I hit the breakpoint. And no floating point exception, the result is just NaN after the multiplication with b.

If any input has a quiet NaN value, it will cause the result of nearly any floating-point operation that consumes it to be NaN without raising the invalid exception.

And checking what Math::NaN() returns:

https://github.com/JSBSim-Team/jsbsim/blob/0f00b806a74027d71a3e758bbb1e177e4b6130f2/src/GeographicLib/Math.cpp#L260-L270

So when you say:

variable m12x is set to zero, instead of Math::NaN, to avoid raising floating point exceptions.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

seanmcleod avatar Oct 13 '24 19:10 seanmcleod

So first off in terms of Python versus JSBSim exe, it looks to me like in both cases for MSC which I'm using to test, that the set of enabled floating point exceptions are the same?

Yes, correct.

Originally, using a debug build and without m12x being initialized I was seeing a constant value for it, I'm assuming the compiler was setting it since I saw the same value for different uninitialized doubles in multiple functions, set to 9.2559631349317831e+61. Which didn't cause any floating point exceptions since it's not too big to cause an overflow etc. But big enough and somewhat unique to stand out so that developers would notice it when debugging.

In the case of the Ubuntu distribution (which is the one used by the GitHub runners), m12x can sometimes be set to a value which is high enough to trigger an overflow FPE. Of course this condition is met by accident and is dependent on the stack history since m12x is most likely allocated in the stack by C++ compilers. Hence the randomness of the bug.

By chance, this condition was systematically fulfilled at some point of the development of my logging4 branch. This was the moment when the bug became reproducible and when I have finally been able to track the bug down to the lack of initialization of m12x.

Are you seeing examples of Math::NaN() returning numeric_limits<T>::max() on some of the platforms you're testing on and then causing an overflow exception? Or are you just being cautious given that some platforms may not have a numeric_limits<T>::quiet_NaN()?

No, I have assumed that multiplying a NaN by a number would raise an invalid FPE. And you are entirely right: my assumption is wrong and setting m12x to Math::Nan() does not trigger any floating point exception. All the tests that I have run following your feedback have passed and proven your point. I now know what "quiet" means in quiet_NaN :smile:

I'm assuming when we were seeing this intermittent bug it was because m12x was randomly sometimes ending up with a value close to max and then causing an overflow exception.

Agreed, I guess that compilers are allocating m12x in the stack so its value depends on the previous content of the stack which is more or less random.

But if we're confident that all modern (?) compilers/platforms have a quiet NaN then wouldn't it be better to go with Math::NaN() so that we don't have to worry about having to perform an update to the GeoGraphicLib each time we pull in an update of it?

Yes, that's a good point. Added to the fact that I was incorrectly assuming that using quiet_Nan in m12x *= b would raise an FPE, I see no longer a reason to initialize m12x to zero.

bcoconni avatar Oct 19 '24 11:10 bcoconni

@seanmcleod have all of your concerns been addressed ?

bcoconni avatar Oct 26 '24 09:10 bcoconni

Yep.

seanmcleod avatar Oct 26 '24 09:10 seanmcleod

Right, the PR is now merged. Thanks for the feedback.

bcoconni avatar Oct 26 '24 18:10 bcoconni