chapel icon indicating copy to clipboard operation
chapel copied to clipboard

mode to halt on floating point overflow / underflow?

Open mppf opened this issue 6 years ago • 29 comments

As a Chapel user, I'd like to be able to compile a Chapel program in a mode that produces an error at run-time if there ever is a floating point operation that overflows or underflows. I'd like to be able to write my floating point code without worrying about Nan and Inf values and would consider these to be in error if they appeared.

This mode could combine with compiler assumption about Nan/Inf propagation described in #11986.

Issue #11832 requests such a feature - something like the GFortran flag -ffpe-trap=zero.

mppf avatar Jan 03 '19 21:01 mppf

I think I'm missing something: While I understand the goals of "assume all my values are within range to avoid execution-time overhead that I don't think will be necessary in my code" as well as "let me know if my assumption in this regard is wrong", it seems like those two goals contradict one another since, in order to get the second behavior, we still have to check for NaNs and infs after key floating point operations, don't we? (and maybe we also still need to do the more expensive operations for some cases to see whether NaNs or infs would've been produced?)

bradcray avatar Jan 03 '19 22:01 bradcray

@bradcray - I don't necessarily understand all of the details but my expectation is that a signal is raised by the hardware and exceptional enough not to impact optimization. The optimization can assume that NaN etc do not arise, because if they do, the program will halt. I don't believe that it's necessary for the software to include the checks and halt. I think this works in C with SIGFPE and we could build upon that.

mppf avatar Jan 03 '19 22:01 mppf

Do GPUs provide hooks like this that would permit such exceptional conditions to be caught without overhead? (my guess would be that at least some don't?).

bradcray avatar Jan 03 '19 22:01 bradcray

They almost certainly don't (Edit: but I don't have any details)

mppf avatar Jan 03 '19 22:01 mppf

For example, here's a C program that does this with GCC:

#pragma STDC FENV_ACCESS on

#define _GNU_SOURCE
#include <fenv.h>

int main()
{
#ifdef FE_NOMASK_ENV
  fesetenv(FE_NOMASK_ENV);
#endif

  double zero = 0.0;
  double one = 1.0;
  double x = zero;

  for (int i = 0; i < 10000; i++ ) {
    x += (one + i) / zero;
  }

  return (int)x;
}

Compile with gcc -O3 trap.c -lm --save-temps and you can see that the assembly includes this loop:

.L2:
        pxor    %xmm0, %xmm0
        cvtsi2sd        %eax, %xmm0
        addl    $1, %eax
        cmpl    $10000, %eax
        addsd   %xmm3, %xmm0
        divsd   %xmm2, %xmm0
        addsd   %xmm0, %xmm1
        jne     .L2

That loop contains only the normal control flow; it doesn't include any special checking of floating point overflow flags e.g. The hardware raises the signal in this setting when the divide by zero occurs.

$ ./a.out
Floating point exception (core dumped)

mppf avatar Jan 03 '19 22:01 mppf

The comment immediately above is a good example of the issue. I agree with @mppf that GPUs probably can't trap, so we might need to think about what to do with them in this mode.

dmk42 avatar Jan 04 '19 00:01 dmk42

Not sure if I am coming at this from the same perspective as you @mppf There seem to be many balls in there air. Trapping, Mandelbrot's and GPUs.

Ignoring GPUs for the moment, you said "Such a compilation mode would allow the Chapel compiler to optimize out any handling of Inf and NaN".

What Inf and NaN are you talking about? Maybe I need to see the code. At which one of the multitude in ../shootout/mandelbrot/ferguson/... should I look please? Why does the compiler need to do any action about Inf and NaN? Doesn't the hardware do all the work?

If one of my team was using a GPU and they did not explicitly check for Inf or NaN at the end of some code segment (back on the CPU), they would get roasted. Chapel should not have to do that sort of thing if the hardware is not going to play nicely. Don't make work for yourself. My 2c.

By the way, you really need to have (zero-zero) in the divisor to be sure it will do what I think you want to achieve in the test. And I was trying to prove issues overflow, I would prefer to test with stuff like

var huge = 1.0e+300:real(64)
var tiny = 1.0e-300:real(64);
var whopper = huge / tiny;

but maybe I am being picky. I would not have thought that 1/0 is guaranteed to exercise FPUs the same way as my example.

damianmoz avatar Jan 04 '19 01:01 damianmoz

These recent comments make me wonder whether floating point modes and changes to them (if any) should perhaps be done via methods on locale types (rather than global flags or functions, say) such that if a locale type like a GPU doesn't support a particular mode / ability to catch errors, it could complain about it when the attempt to set it was made (or, in some cases, potentially even at compile-time). This would also correspond to the fact that in practice any given locale's processor would presumably have to be involved in any mode changes (?). As a convenience, of course, the method could be called in a promoted manner on the Locales array if the goal was to have all the locales switch to the same mode (where perhaps a hybrid CPU/GPU locale would "do the right thing" for the CPU and ignore the request for the GPU?).

bradcray avatar Jan 04 '19 06:01 bradcray

What Inf and NaN are you talking about? Maybe I need to see the code. At which one of the multitude in ../shootout/mandelbrot/ferguson/... should I look please? Why does the compiler need to do any action about Inf and NaN? Doesn't the hardware do all the work?

Actually I'm just thinking about how complex multiply needs to be implemented. If the compiler needs to propagate Inf and Nan appropriately (at least if we follow C in this regard) then the complex multiply becomes branchy and hard to optimize/vectorize. But if it's just the basic algorithm, Inf and Nan won't propagate in a mathematically appealing way, but the multiply can run faster.

A version of mandelbrot that uses complex operations is test/studies/shootout/mandelbrot/jacobnelson/mandelbrot-complex.chpl and the inner loop consists of:

  • complex multiply
  • complex add
  • complex abs a.k.a. magnitude

However it's important to understand that I don't yet know how much the extra branching for complex multiply supporting Inf/Nan impacts this particular benchmark.

But I know it matters for the synthetic benchmark from this comment: https://github.com/chapel-lang/chapel/issues/11936#issuecomment-451187713 Here the program is about 4x faster with GCC as the backend compiler with --no-ieee-float or with --ccflags -fcx-limited-range. Since -fcx-limited-range recovers the performance, we can be reasonably confident that the additional branching in the complex multiply is responsible for the 4x performance difference in that benchmark.

I would not have thought that 1/0 is guaranteed to exercise FPUs the same way as my example.

It's not actually a conforming C program and @dmk42 already pointed this out to me :) Nonetheless I was just trying to get something simple to demonstrate how it works in C.

Separately, it occurs to me that this issue is really proposing two things:

  1. A compilation mode where the compiler can assume it does not need to propagate Inf or Nan when e.g. doing complex multiply.
  2. A runtime mode where a floating point operation that produces Inf or Nan triggers the program to halt.

Interestingly, we could have 1 and possibly combine it with #11969 - where instead of halting on overflow etc, the floating point status flag would be set, and the code doing the computation would have to eventually check the status flag and then ignore the computation that had overflow in it. @damianmoz I'm wondering if this matches your typical use case for fegetexceptflag and friends in C today.

mppf avatar Jan 04 '19 14:01 mppf

@bradcray -

These recent comments make me wonder whether floating point modes and changes to them (if any) should perhaps be done via methods on locale types (rather than global flags or functions, say) such that if a locale type like a GPU doesn't support a particular mode / ability to catch errors, it could complain about it when the attempt to set it was made (or, in some cases, potentially even at compile-time).

I agree about needing to handle hardware that doesn't support the feature.

But I don't think the locale is the right place to put it, because I'd expect that the rounding mode or floating point status flags will be task-local / thread-local. After all, as far as I know, these end up being stored in a flags register somewhere on a real CPU (in particular MXCSR on a x86 CPU supporting SSE). Setting it in the locale would imply that it will affect other running tasks, which I don't think is what we want.

We could have a setting in the locale that controlled the settings for new tasks created on that locale, but I think that's more confusing than just copying the settings from the task creating that task.

Note that the status flags ("has overflow occurred?") are pretty different from the other floating point settings that C allows one to manipulate (e.g. "round to zero" or "halt the program on overflow"). In particular, we might have a different policy for the status flags than the rest (e.g. maybe status flags are always cleared when a task starts, and accumulated on task join, but the other settings simply copy the parent task setting when creating a new task).

mppf avatar Jan 04 '19 14:01 mppf

Separately, it occurs to me that this issue is really proposing two things:

  1. A compilation mode where the compiler can assume it does not need to propagate Inf or Nan when e.g. doing complex multiply.
  2. A runtime mode where a floating point operation that produces Inf or Nan triggers the program to halt.

This is part of what I was trying to get at in my first comment above, though I was missing the fact that maybe SW didn't need to be involved in item 2. I think it's worth editing the original issue description to clarify this (or to break one of the items out into its own issue).

bradcray avatar Jan 04 '19 21:01 bradcray

The rounding mode is a very different beast to the exception flags. The rounding mode is where you tell the CPU/core what you want. Having the rounding mode change depending on the running CPU sounds like a recipe for a big mess. But then again, my areas of interest/application are by definition to me (+customers),

An exception flag is set by the FPU depending on the data you ask it to play with. All you should do, I humbly suggest, is to clear it. Well that's what I assumed is the best programming practice. I have never attempted to raise an exception explicitly using the fenv interface although I do portably force an exception using data. Certainly when I code an individual algorithm, I never go near the fenv interface, assuming the user of my module, even if it is me, will configure that appropriately in the calling code and recover/trap from a set exception.

damianmoz avatar Jan 04 '19 23:01 damianmoz

@mppf, I would like to reply to your post back-to-front and rephrase your list.

The 1st proposal is a compilation mode where the compiler can assume it does not need to propagate Inf or NaN in a mathematically appropriate way when e.g.doing complex multiply. Note that it still propagates the Inf or NaN.

The 2nd proposal is a runtime mode where a floating point operation that produces Inf or Nan triggers the program to halt.

The implicit 3rd proposal is to retain status quo in things like complex multiplication propagate Inf or NaN in a mathematically appropriate way which may allow you to backtrack without debugging. I am sure there is a need for this but I do not have such a need.

You also say ... Interestingly, we could have 1 and possibly combine it with #11969 - where instead of halting on overflow etc, the floating point status flag would be set, and the code doing the computation would have to eventually check the status flag and then ignore the computation that had overflow in it.

Yes, this pretty much matches my typical use case for fetestexceptflag and friends in C/C++ today. The only difference is that in some cases, my code realises the problem itself and then (say) rescales an input matrix or does some other intelligent thing and either reruns the previously failed algorithm or an alternative. Or I can just note the failure, tell myself how stupid I was, and rework my algorithm.

Handling C' Annex G definition of complex multiplication is, for my usage, over the top. But my usage of complex numbers is limited, The first proposal will suit me most of the time because Inf and NaN still propagate, just not in any mathematically nice way. You need more opinions on this than mine.

You also say ... However it's important to understand that I don't yet know how much the extra branching for complex multiply supporting Inf/Nan impacts this particular benchmark.

When I looked at all those copysign's there, I would bet a case of beer that the answer is HEAPS!

I was not trying to be picky with my comment about 1/0. I just wanted to ensure that we were both talking about overflow as I understood it.

damianmoz avatar Jan 05 '19 03:01 damianmoz

@bradcray @damianmoz - I've moved the compile-time assumption part to #11986.

@damianmoz - I think we're in general agreement about this but there are some finer points that I tried to identify as questions in the statement of #11986.

mppf avatar Jan 07 '19 15:01 mppf

I'm surprised this is not an option already, I'm having a hard time tracking a bug that turns all my fields to NaN and this is usually the easiest way to find these problems. Even if it's only supported for certain types of locales it's invaluable as a verification tool.

RedHatTurtle avatar Aug 18 '22 01:08 RedHatTurtle

I wonder how this will work if I just reply by email directly.

On Wed, 17 Aug 2022, F?bio Malacco Moreira wrote:

I'm surprised this is not an option already, I'm having a hard time tracking a bug that turns all my fields to NaN and this is usually the easiest way to find these problems. Even if it's only supported for certain types of locales it's invaluable as a verification tool.

You want floating point traps. You would have to provide your own C interface.

Floating point traps are largely on the way out under IEEE 754.

Can you send me your code? On what machine are you running?

You can test for the existing of 'x' being a NaN using

x != x

which evaluates as true if x is a NaN.

Stay safe - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer

damianmoz avatar Aug 18 '22 01:08 damianmoz

I've added a "user issue" label to this given @redhatturtle's comment above.

It's been years since I've looked at or thought about this issue, but reviewing Michael's C / gcc example above I'm wondering whether it would be low-hanging fruit to have the compiler emit similar code into the Chapel generated code (on each locale, I suppose) when a particular compiler flag is used. @mppf, is that the approach you'd imagined?

bradcray avatar Aug 22 '22 18:08 bradcray

is that the approach you'd imagined?

Not sure exactly what approach we need, but if I were to work on it, the first place I would look is how clang handles it (or if it does at all). If clang handles it, we can look at what corresponding LLVM IR we would need to generate. If not, I think it'd be a matter of trying to understand what facility LLVM IR has for this feature.

mppf avatar Aug 22 '22 18:08 mppf

I suggest the following document for some background discussion on the fundamental issue:

https://www.agner.org/optimize/nan_propagation.pdf

See also

https://stackoverflow.com/questions/49011370/nan-propagation-and-ieee-754-standard

Here is where Julia talk about the same issue:

https://github.com/JuliaLang/julia/issues/27705

Not that I think Julia has all (or even any of) the answers on a given topic.

damianmoz avatar Aug 22 '22 19:08 damianmoz

Not sure exactly what approach we need, but if I were to work on it, the first place I would look is how clang handles it (or if it does at all).

Given that @redhatturtle indicated a need for this now and a willingness for it only to be supported in certain cases, I was wondering whether it would be possible to wire up a solution quickly using the C back-end, gcc, and an approach based on the C code you sketched above. E.g., drop the guts into an external C routine:

#ifdef FE_NOMASK_ENV
  fesetenv(FE_NOMASK_ENV);
#endif

and then do a coforall loc in Locales do on loc that calls the routine on each locale (?).

bradcray avatar Aug 23 '22 01:08 bradcray

FYI, LLVM stil has some difficulties preventing incorrect optimizations in the presence of floating-point environment access. For example:

https://github.com/llvm/llvm-project/issues/52891

(I just picked the first one I found because I knew there were issues like this out there. There are various other similar but slightly different issues.)

Consequently, targeting gcc for the feature as @bradcray suggests might be a more reliable stop-gap until LLVM resolves more FP environment difficulties.

dmk42 avatar Aug 23 '22 05:08 dmk42

Regarding the previous two comments - yes, I agree that would be a good short-term solution.

mppf avatar Aug 23 '22 10:08 mppf

I'm personally fine with having to use the GCC back-end for this feature, specially if there are external issues that need to be addressed in LLVM before it can be made reliably available in that back-end version.

RedHatTurtle avatar Aug 23 '22 14:08 RedHatTurtle

@redhatturtle: I had hoped to find time to prototype this this week, but failed to do so (even though it seemed more fun than everything else I had to do :D ). However, on my way out the door for the weekend, I was writing up notes to you describing how I would attempt it (in case you wanted to try), and ended up creating a prototype, but not quite getting it working on my system. Maybe you or someone else can see what I did wrong.

  1. I set CHPL_TARGET_COMPILER to gnu and re-built Chapel

  2. I took Michael's C code from above and turned it into a routine designed to be called from Chapel, in a file called floatexcpt.h, say:

#pragma STDC FENV_ACCESS on

#define _GNU_SOURCE
#include <fenv.h>

static void setCFloatExceptions() {
  printf("In C code\n");
#ifdef FE_NOMASK_ENV
  printf("Calling fesetenv(FE_NOMASK_ENV)\n");
  fesetenv(FE_NOMASK_ENV);
#endif
}```

3) I created a wrapper for it in Chapel that calls it on all locales when a config is set to true, along with a transliteration of the code from Michael's C code above (call the file `floatexcpt.chpl`):

```chapel
config const exceptionsOn = true;

proc setFloatExceptions() {
  require "floatexcpt.h";

  extern proc setCFloatExceptions();
  coforall loc in Locales do
    on loc do
      setCFloatExceptions();
}

if exceptionsOn then
  setFloatExceptions();

var zero = 0.0,
    one = 1.0,
    x = zero;

for i in 0..<1000 do
  x += (one + i) / zero;

writeln(x:int);
  1. Compiled using:
$ chpl floatexcpt.chpl

What I'm seeing is:

In C code
-9223372036854775808

which suggests that I'm not getting into the #ifdef FE_NOMASK_ENV in C (so the lack of an exception + core dump like Michael got is not surprising).

Strangely, though, if I compile Michael's example on the same system, I do get into the #ifdef and see the exception... So clearly I'm doing something dumb on a Friday afternoon... Feels like I'm close, though...

Best guess offhand is that Chapel #includes some system header files prior to this one which somehow disables the macro... I'm timing out before sorting out what that could be, though. If that's the case, I think we could work around this by moving the routine into a .c file and creating a prototype for it in a .h file, such that rather than having the code be inlined into the generated Chapel code, it would be in a separate C file that wouldn't include any additional headers. I'd give that a try if it weren't dinnertime.

bradcray avatar Aug 27 '22 01:08 bradcray

OK, I was too curious about this not to try it again after dinner, and got it working. Separating the routine into its own C file did do the trick (and, in retrospect, perhaps because the #pragma and #define need to be set before #Including specific headers. So the recipe is:

  1. I set CHPL_TARGET_COMPILER to gnu and build Chapel

  2. Put this in floatexcpt.h (can also decorate it with standard #ifndef...#define...#endif patterns:

void setCFloatExceptions(void);
  1. Put this in floatexcpt.c (will want to remove the printf()s once it's working probably):
#define _GNU_SOURCE

#include <stdio.h>
#include "floatexcpt.h"

#include <fenv.h>

void setCFloatExceptions() {
  printf("In C code\n");
#ifdef FE_NOMASK_ENV
  printf("Calling fesetenv(FE_NOMASK_ENV)\n");
  fesetenv(FE_NOMASK_ENV);
#endif
}
  1. Test is basically the same, in floatexcpt.chpl:
config const exceptionsOn = true;

proc setFloatExceptions() {
  require "floatexcpt.h";

  extern proc setCFloatExceptions();
  coforall loc in Locales do
    on loc do
      setCFloatExceptions();
}

if exceptionsOn then
  setFloatExceptions();

var zero = 0.0,
    one = 1.0,
    x = zero;

for i in 0..<1000 do
  x += (one + i) / zero;

writeln(x:int);
  1. Compile:
$ chpl floatexcpt.c floatexcpt.chpl
  1. Run:
$ ./floatexcpt
In C code
Calling fesetenv(FE_NOMASK_ENV)
Floating point exception
$ ./floatexcpt --exceptionsOn=false
-9223372036854775808

And then, once working, feel free to rename files, refactor code, etc. to your liking.

bradcray avatar Aug 27 '22 01:08 bradcray

@redhatturtle : Closing tabs today, I wanted to check in and see whether you were able to put this technique to use and whether it helped you find your error. If so, I think we could make a package module that provided this capability when using appropriate compilers. If not, that'd be good to know.

bradcray avatar Sep 01 '22 17:09 bradcray

FYI:

  1. The C standard requires "on" to be capitalized:
#pragma STDC FENV_ACCESS on-off-switch
on-off-switch: one of
    ON OFF DEFAULT
  1. GCC ignores #pragma STDC FENV_ACCESS:

warning: ignoring '#pragma STDC FENV_ACCESS'

pmor13 avatar Sep 20 '22 20:09 pmor13

@pmor13: Thanks for the notes. I wasn't getting a complaint w.r.t. the second until I added -Wall using gcc 8.3.0, and have verified that removing that line has no effect. I've updated my previous comment to remove it. Thanks again.

bradcray avatar Sep 20 '22 21:09 bradcray

The IEEE 754 trend is away from traps. Not least because they have big implications with SIMD. How long it is before they disappear from C is another question.

damianmoz avatar Mar 19 '25 05:03 damianmoz