mercury icon indicating copy to clipboard operation
mercury copied to clipboard

IEEE invalid operation when checking for central body collision

Open tobiscode opened this issue 5 years ago • 8 comments

Hi all,

I sometimes stumbled across the following warning message when running Mercury:

Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG

With not much more to go on, I added the gfortran compiler flag -ffpe-trap=invalid (and removed -O2 as otherwise the debugger didn't shoe the correct line) to find the problem: calling acos() for an undefined value. Here is the part (line 963 in mercury6_2.for):

u0 = sign (acos((1.d0 - r0/a )/e), rv0)

I then added some debug output to check the values of r0, a, e and rv0 right before the error:

parameter value
r0 8699.8554879043295
a 4349.9294907593367
e 0.99999919683544536
inside acos -1.0000000000218023
rv0 -8.0957151629424393E-006

I run large suites of simulations, keeping big.in the same but having randomized particles in small.in, so every run is slightly different, and it doesn't happen in most runs. I attached the initial conditions for a sample run where the bug does occur.

  • System: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-76-generic x86_64)
  • gfortran: GNU Fortran (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 using the provided Makefile
  • Mercury: most up-to-date version of this repository, except for three parameters I changed in mercury.inc before compiling (CMAX = 1000 and NFILES = 200)

A possible solution is referenced in #18, but it's probably not the best way as it masks possible other bugs as was pointed out.

Cheers

tobiscode avatar Feb 08 '20 20:02 tobiscode

I reproduced this using the supplied initialization files. My system:

Description:	Ubuntu Focal Fossa (development branch)
Release:	20.04
Codename:	focal
----- OS kernel name/release/version -----
5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020
----- Machine hardware platform -----
x86_64
----- gfortran
9.2.1-28ubuntu1

mercury 6 stdout/stderr:

   Integrating massive bodies and particles up to the same epoch.
   Beginning the main integration.
 Time:       1368.925 years   dE/E:  4.69510E-08   dL/L:  3.48461E-14
 Time:       2737.851 years   dE/E: -7.21165E-07   dL/L:  4.09303E-14
 Time:       4106.776 years   dE/E: -6.46237E-07   dL/L:  5.47581E-14
 Time:       5475.702 years   dE/E: -1.01207E-06   dL/L:  5.27300E-14
 Time:       6844.627 years   dE/E: -6.24822E-07   dL/L:  2.06495E-14
 Time:       8213.552 years   dE/E: -7.97592E-07   dL/L:  8.11232E-15
 Time:       9582.478 years   dE/E: -5.56704E-07   dL/L:  1.56346E-13
   Integration complete.
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG

texadactyl avatar Feb 08 '20 21:02 texadactyl

Added flag -ffpe-trap=invalid:

   Integrating massive bodies and particles up to the same epoch.
   Beginning the main integration.
 Time:       1368.925 years   dE/E:  4.69510E-08   dL/L:  3.48461E-14
 Time:       2737.851 years   dE/E: -7.21165E-07   dL/L:  4.09303E-14
 Time:       4106.776 years   dE/E: -6.46237E-07   dL/L:  5.47581E-14
 Time:       5475.702 years   dE/E: -1.01207E-06   dL/L:  5.27300E-14
 Time:       6844.627 years   dE/E: -6.24822E-07   dL/L:  2.06495E-14
 Time:       8213.552 years   dE/E: -7.97592E-07   dL/L:  8.11232E-15

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f6e489aaca1 in ???
#1  0x7f6e489a9e75 in ???
#2  0x7f6e4867646f in ???
#3  0x7f6e4884db90 in ???
#4  0x5617e1d4b79e in mce_cent_
	at ./mercury6_2.for:963
#5  0x5617e1d44608 in mal_hcon_
	at ./mercury6_2.for:697
#6  0x5617e1d5f3a8 in MAIN__
	at ./mercury6_2.for:189
#7  0x5617e1d5fb3e in main
	at ./mercury6_2.for:221

texadactyl avatar Feb 08 '20 21:02 texadactyl

Ran again with the custom parameter files supplied by @tobiscode. Note subroutine mce_cent

Using 2-body approximation. Collision with central body (q <= rcen). Time of impact relative to the end of the timestep (e < 0)

Note that algor = 10 (HYBRID) is specified in the custom param.in. If you change just the algor value back to BS like in the sample, no collision with central body occurs and, therefore, no program crash occurs.

Something is wrong with the collision with central body logic. It certainly needs protective code to avoid floating pt exceptions. This will probably need an astrophysicist to figure out.

One other thing although not related to this issue. The calculations specific to algor=11 look suspicious. Uninitialized arrays are used in calculations. Assumed to be zero? Bad practice.

My customized source files used for tracing are attached - ZIP of 2 files: mercury6_2.for and mercury.inc. I added a tracing flag which controls whether or not to trace. Once this bug is fixed, the customization isn't really needed.

xx.tar.gz

texadactyl avatar Feb 09 '20 18:02 texadactyl

Ran again with the custom parameter files supplied by @tobiscode. Changed algor=BS2 in param.in and got another central body collision: &MCE_CENT_E_LT_1 (output from line 979) P= 1.6726537648361467E-003, Q= 8.3632696282470525E-004, RCEN= 4.6547587699999997E-003, A= 4349.4098229833053 , E= 0.99999980771484021 , H= 58.974682251497846 , R0= 8698.8187078549345 , RV0= -2.4711172542430239E-004, /

but mercury6 continued to the end without issues.

Note that when algor=HYBRID, this display appears before the program crash: &MCE_CENT_E_LT_1 P= 6.9874155583106654E-003, Q= 3.4937091821670221E-003, RCEN= 4.6547587699999997E-003, A= 4349.9294907593367 , E= 0.99999919683544536 , H= 100.00000000000000 , R0= 8699.8554879043295 , RV0= -8.0957151629424393E-006, /

In summary,

  • Both BS2 and HYBRID algorithms produced a central body collision.
  • All of the algorithms ran without crashing for me except HYBRID.

It could be that @tobiscode tried a combination of parameters with algor=HYBRID that makes no sense (I am not qualified to judge). If that is true, then, optimally, the initialization code should be amended to diagnose such combinations. Needs an astrophysicist, I believe.

Clearly this early 1990s Fortran code was written to be used by scientists who would first check their parameters should a crash occur.

texadactyl avatar Feb 10 '20 16:02 texadactyl

@tobiscode, @texadactyl, Hi guys! Sorry for a great response delay, it turned out that I got notifications turned off all the time for this repo. Thanks for your bug report. I have briefly looked over the discussion to understand the problem and it seems that I faced this problem several times before. I will try to come up with some ideas how to fix it when I get free of my job stuff.

This will probably need an astrophysicist to figure out.

Celestial mechanics specialist, to be precise ;-)

idovgalyov-4xxi avatar Feb 18 '20 19:02 idovgalyov-4xxi

@idovgalyov-4xxi See? I didn't event know who the right fixer would be! (=:

texadactyl avatar Mar 16 '20 16:03 texadactyl

@tobiscode Is this still an issue? If so, can figure out where the conflict is? If not, please close this issue.

texadactyl avatar Feb 18 '23 13:02 texadactyl

Hi, I've moved on to different projects and am not working with mercury anymore. I ended up being fine with the "fix" I made in the PR, and then assumed that @idovgalyov-4xxi would eventually tackle the problem. If that hasn't happened, then I believe the problem is still there. I'm sorry I can't help anymore with this. Cheers

tobiscode avatar Feb 20 '23 23:02 tobiscode