cpp_weekly When can a compiler generate fused multiply-add, and when can it not?

When can a compiler generate fused multiply-add, and when can it not?

Open OskarSigvardsson opened this issue 2 months ago • 0 comments

Channel

C++ Weekly

Topics

Godbolt link showing the issue: https://godbolt.org/z/d1Yz74T6r

In my company, we had tests that were failing when building for ARM, but not x86, and what we discovered was some very (in my opinion) counter-intuitive behavior regarding when compilers are allowed to insert FMA (fused multiply-add) instructions. I was under the impression that unless you had something like -ffast-math enabled, if you wrote a*b + c with floating point values, it would just emit exactly that sequence of instructions for the CPU: a floating point multiplication followed by a floating point addition.

Turns out, that's not the case. A close reading of the standard (8.1, [expr.pre]) and GCC and Clangs documentation states that in an expression, a compiler is allowed to substitute other floating point operations as long as the precision is as good or better, which it is with FMA (since there's only a single rounding-step, not two).

This is what was causing our tests to fail: on x86, it was "mul + add", on ARM it was "fmadd". You can get a fused multiply-add on x86 with -mfma, but it's not universally supported on x86, which is why it's not the default.

It's curious though, from my reading of the standard, compilers are not allowed to do this if you split it over two expressions. That is, it can turn this into a fused multiply-add:

float f = a*b + c;

but not this:

float f1 = a*b;
float f2 = f1 + c;

Clang agrees with me on this, but GCC does not: https://godbolt.org/z/d1Yz74T6r

I find this behavior extremely counter-intuitive, and frankly bad. I want my floating point operations to work exactly the same on ARM and x86, and I want them to give me the same result in the default configuration (this is with -ffast-math off, as a reminder). You can disable this with -ffp-contract=off, and that is indeed how we fixed this issue, and is going to be a default flag for me from now on.

Anyway, this was quite surprising to me, and I think it's worth discussing, and I'm very curious about Jason's opinion if the standard has made the correct decision. I can see the reasoning for this, but for me the loss of portability and predictability is not worth the benefits in performance/precision. There's a reason std::fma exists, so that I can be explicit when I want this behaviour. Floating point is hard enough without compilers messing with you behind your back.

Length

Hard to say, a more in-depth discussion would maybe be 20ish minutes, but I think you can do a short 10min episode on this.

Oct 06 '25 12:10 OskarSigvardsson

cpp_weekly cpp_weekly copied to clipboard

When can a compiler generate fused multiply-add, and when can it not?

cpp_weekly
cpp_weekly copied to clipboard