jsbsim
jsbsim copied to clipboard
Floating Point Exception in Actions build process
I'm submitting a ...
- [x] bug report
- [ ] feature request
- [ ] support request => Please do not submit support request here
Describe the issue
The build for MSVC repeatedly fails while executing the test CheckScripts.py
and more specifically while running the script c172_runway_at_rest_cg_shift.xml
.
What is the current behavior? The test reports an FPE (Floating Point Exception) which seems to be located in the waypoint/geodetic computations code. Here is the stack trace as reported by the test failure:
#11 Source "D:\a\jsbsim\jsbsim\build\python\jsbsim.cxx", line 14008, in __pyx_pf_6jsbsim_9FGFDMExec_10run [00007FFF462BEB9D]
14005: */
14006: __Pyx_XDECREF(__pyx_r);
14007: try {
>14008: __pyx_t_1 = __pyx_v_self->thisptr->Run();
14009: } catch(...) {
14010: convertJSBSimToPyExc(); if (!PyErr_Occurred())PyErr_SetString(PyExc_RuntimeError, "Error converting c++ exception.");
14011: __PYX_ERR(0, 579, __pyx_L1_error)
#10 Source "D:\a\jsbsim\jsbsim\src\FGFDMExec.cpp", line 423, in JSBSim::FGFDMExec::Run [00007FFF46310F82]
420: // returns true if success, false if complete
421: if (Script && !IntegrationSuspended()) success = Script->RunScript();
422:
> 423: for (unsigned int i = 0; i < Models.size(); i++) {
424: LoadInputs(i);
425: Models[i]->Run(holding);
426: }
#9 Source "D:\a\jsbsim\jsbsim\src\models\FGFCS.cpp", line 171, in JSBSim::FGFCS::Run [00007FFF4637492E]
168: for (i=0; i<SystemChannels.size(); i++) {
169: if (debug_lvl & 4) cout << " Executing System Channel: " << SystemChannels[i]->GetName() << endl;
170: ChannelRate = SystemChannels[i]->GetRate();
> 171: SystemChannels[i]->Execute();
172: }
173: ChannelRate = 1;
#8 Source "D:\a\jsbsim\jsbsim\src\models\flight_control\FGWaypoint.cpp", line 185, in JSBSim::FGWaypoint::Run [00007FFF463DB519]
182: double heading_to_waypoint_rad = source.GetHeadingTo(target_longitude_rad,
183: target_latitude_rad);
184:
> 185: if (eUnit == eDeg) Output = heading_to_waypoint_rad * radtodeg;
186: else Output = heading_to_waypoint_rad;
187:
188: } else { // Calculate Distance
#7 Source "D:\a\jsbsim\jsbsim\src\math\FGLocation.cpp", line 402, in JSBSim::FGLocation::GetHeadingTo [00007FFF4636379A]
399: geod.Inverse(mGeodLat * radtodeg, mLon * radtodeg, target_latitude * radtodeg,
400: target_longitude * radtodeg, heading, azimuth2);
401:
> 402: return heading * degtorad;
403: }
404:
405: //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#6 Source "D:\a\jsbsim\jsbsim\src\GeographicLib\Geodesic.cpp", line 500, in GeographicLib::Geodesic::GenInverse [00007FFF463C1C22]
497: real& S12) const {
498: outmask &= OUT_MASK;
499: real salp1, calp1, salp2, calp2,
> 500: a12 = GenInverse(lat1, lon1, lat2, lon2,
501: outmask, s12, salp1, calp1, salp2, calp2,
502: m12, M12, M21, S12);
503: if (outmask & AZIMUTH) {
#5 Source "D:\a\jsbsim\jsbsim\src\GeographicLib\Geodesic.cpp", line 400, in GeographicLib::Geodesic::GenInverse [00007FFF463C1756]
397: cbet1, cbet2, lengthmask, s12x, m12x, dummy, M12, M21, Ca);
398: }
399: m12x *= _b;
> 400: s12x *= _b;
401: a12 = sig12 / Math::degree();
402: if (outmask & AREA) {
403: // omg12 = lam12 - domg12
#4 Object "", at 00007FFF7E50379E, in KiUserExceptionDispatcher
#3 Object "", at 00007FFF7E464CEF, in RtlWalkFrameChain
#2 Object "", at 00007FFF7E504A2F, in _chkstk
#1 Object "", at 00007FFF484FEBC0, in _C_specific_handler
Attempt to access invalid address.
#0 Object "", at 00007FF609A224CC, in _C_specific_handler
FAIL
ERROR
======================================================================
ERROR: testScripts (__main__.CheckScripts)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\a\jsbsim\jsbsim\tests\JSBSim_utils.py", line 156, in tearDown
self.sandbox.erase()
File "D:\a\jsbsim\jsbsim\tests\JSBSim_utils.py", line 56, in erase
shutil.rmtree(self._tmpdir)
File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "C:\hostedtoolcache\windows\Python\3.9.5\x64\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\jsbsim\\jsbsim\\build\\tests\\tmpbzkk6123\\JSBout172B.csv'
======================================================================
FAIL: testScripts (__main__.CheckScripts)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\a\jsbsim\jsbsim\tests\CheckScripts.py", line 39, in testScripts
ExecuteUntil(fdm, 30.)
fpectl.FloatingPointError: Caught signal SIGFPE in JSBSim
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\a\jsbsim\jsbsim\tests\CheckScripts.py", line 43, in testScripts
self.fail("Script %s failed:\n%s" % (s, e.args[0]))
AssertionError: Script ..\..\..\scripts\c172_runway_at_rest_cg_shift.xml failed:
Caught signal SIGFPE in JSBSim
What is the expected behavior? All the tests to run successfully.
What is the motivation / use case for changing the behavior? N/A
Please tell us about your environment:
OS: GitHub Actions windows-latest
host runner
JSBSim: commit 11ea52ed340ea32a34e6e3dfc6df503f9ad16cf5 from the master
branch.
Python: 3.6
Other information None
When I run the scripts\c172_runway_at_rest_cg_shift.xml
locally here on Windows I don't see the issue. I think I saw you making a comment on one of the commits that you also couldn't reproduce it locally on Windows?
PS C:\source\jsbsim> .\Release\JSBSim.exe --script=scripts\c172_runway_at_rest_cg_shift.xml
...
...
------------------------------------------------------------------
State Report at sim time: 0.000000 seconds
Position
ECI: -1329521.725534581, -14711849.15995605, 14771801.97958151 (x,y,z, in ft)
ECEF: -1329521.725535 , -14711849.159956 , 14771801.979582 (x,y,z, in ft)
Local: 45.192423, -95.163839, 4.600026 (geodetic lat, lon, alt ASL in deg and ft)
Orientation
ECI: -180, -44.80757682585688, 84.836161 (phi, theta, psi in deg)
Local: -5.264532060613332e-16, -1.590277340731758e-15, 8.20496608763031e-15 (phi, theta, psi in deg)
Velocity
ECI: 1072.80495937053, -96.95025317596601, 0 (x,y,z in ft/s)
ECEF: 0, 0, 0 (x,y,z in ft/s)
Local: 0.000000 , 0.000000 , 0.000000 (n,e,d in ft/sec)
Body: 0.000000 , 0.000000 , 0.000000 (u,v,w in ft/sec)
Body Rates (relative to given frame, expressed in body frame)
ECI: 0.002944405985131796, -4.638594100204067e-19, -0.002964249795347291 (p,q,r in deg/s)
ECEF: 0, 0, 0 (p,q,r in deg/s)
---- JSBSim Execution beginning ... --------------------------------------------
Start: Sunday July 11 2021 21:44:18 (HH:MM:SS)
0: GEAR_CONTACT: 0 seconds: Nose Gear 1
1: GEAR_CONTACT: 0.091663 seconds: Left Main Gear 1
2: GEAR_CONTACT: 0.091663 seconds: Right Main Gear 1
Shift weight (Event 0) executed at time: 1.508273
End: Sunday July 11 2021 21:44:18 (HH:MM:SS)
PS C:\source\jsbsim>
I think I saw you making a comment on one of the commits that you also couldn't reproduce it locally on Windows?
Indeed, the test completes successfully for me on Linux and Windows. It only fails when the test is executed on GitHub host runners for MSVC. Most likely there is a combination of criteria that are met on GitHub and which do not exist on our PC. May be the version of Python ? It's 3.6 on GitHub while being 3.7 on my Windows PC and 3.9 on my Linux PC.
At the moment, there is no reason to discard this problem as being a mere coincidence. Especially since the failure is reproducing repeatedly at each build which IMHO indicates that it is a genuine bug.
At some point later this week, I intend to run scripts\c172_runway_at_rest_cg_shift.xml
with a debugger and check if the variables s12x
or m12x
are close to values that could trigger an FPE: either extremely small or extremely huge. A division by zero or an illegal value seem to be excluded at first since the FPE is raised during multiplications.
Hmm, I wasn't running it from python, I ran jsbsim.exe
directly. Just trying python CheckScripts.py
now it doesn't look like fpectl
is a standard installed module?
PS C:\source\jsbsim\tests> python .\CheckScripts.py
Traceback (most recent call last):
File ".\CheckScripts.py", line 23, in <module>
import fpectl
ModuleNotFoundError: No module named 'fpectl'
Also looking at CheckScipts.py
I'm a bit confused, I don't see c172_runway_at_rest_cg_shift.xml
referenced?
for s in self.script_list(['737_cruise_steady_turn_simplex.xml']):
Just trying
python CheckScripts.py
now it doesn't look likefpectl
is a standard installed module?
The command python CheckScripts.py
should be executed from the tests
folder where the Python modules jsbsim
and fpectl
have been built: it should be something like build/Release/tests
, build/RelWithDebInfo/tests
or build/Debug/test
.
Also looking at
CheckScipts.py
I'm a bit confused, I don't seec172_runway_at_rest_cg_shift.xml
referenced?
Yeah, the test API is poorly designed (who read the tests anyway ? :smile:): the list provided to self.script_list
is actually a blacklist i.e. the list of scripts to be excluded from the test.
https://github.com/JSBSim-Team/jsbsim/blob/11ea52ed340ea32a34e6e3dfc6df503f9ad16cf5/tests/JSBSim_utils.py#L159-L169
I guess the line of code would be clearer if it would be written like so:
- for s in self.script_list(['737_cruise_steady_turn_simplex.xml']):
+ for s in self.script_list(blacklist=['737_cruise_steady_turn_simplex.xml']):
Ah, I did run python CheckScripts.py
in the tests
folder, however I hadn't performed a cmake
build, just the Visual Studio build of the project which only builds jsbsim.exe
.
I guess the line of code would be clearer if
Yep, especially while taking a quick glance at the code while watching the Euros soccer final at the same time! :wink:
Of course, when running the test from build/.../tests
you need to give the relative path to python
: something like
> python ../../../tests/CheckScripts.py
In order to save myself the trouble of all these dots, I generally use the ctest
command from the build directory root (i.e. build/Debug
, build/Release
, etc.)
> ctest -R CheckScript --output-on-failure
The last parameter --output-on-failure
to make sure that the complete test log is only dumped if there is a failure in the test.
It finally happened to be a genuine bug which was located is our dependency GeographicLib. It has been fixed in the release v1.52 of GeographicLib:
- Geodesic routines: be more aggressive in preventing negative
s12
andm12
for short lines (all languages).
However, in its release v1.51, GeographicLib switched to C++11 math routines rather than using their own:
- C++11 compiler required for C++ library. As a consequence:
- The workaround implementations for C++11 routines (
Math::hypot
,Math::expm1
,Math::log1p
,Math::asinh
,Math::atanh
,Math::copysign
,Math::cbrt
,Math::remainder
,Math::remquo
,Math::round
,Math::lround
,Math::fma
,Math::isfinite
, andMath::isnan
) are now deprecated. Just use the versions in thestd:: namespace
instead. -
SphericalEngine
class, fix the namespace for using streamoff. - Some templated functions, e.g.,
Math::degree()
, now have default template parameters,T = Math::real.
- The workaround implementations for C++11 routines (
So it may be that Microsoft (i.e. stdlib) C++11 math functions are better handling the subtleties of floating point arithmetic than GeographicLib's !?!
So how come I didn't see the error when I ran .\Release\JSBSim.exe --script=scripts\c172_runway_at_rest_cg_shift.xml
?
Since I wasn't running it via the python test, rather running the JSBSim.exe
were the set of floating point exceptions that are enabled different?
Floating point exceptions are only raised when the Python module fpectl
is used (which function turnon_sigfpe
sets some CPU flags to send a signal SIGFPE whenever an FPE occurs). Otherwise, FPEs are plainly and simply ignored.
There has just been another occurrence of the bug on the Linux runner: same stack trace than above, script scripts/c1724.xml
.
It smells like there is an uninitialized variable somewhere in GeographicLib code.