num-complex icon indicating copy to clipboard operation
num-complex copied to clipboard

Proposal: Add fma_{mul,div} for FMA-based complex operations

Open zhongyi51 opened this issue 5 months ago • 1 comments

Proposal

I propose adding fma_mul and fma_div methods to the Complex type. These methods would leverage fused multiply-add (FMA) operations for the calculation.

Motivation

Using FMA can offer significant performance benefits on hardware with native support, but it comes with important trade-offs:

  • Performance Variance: On modern CPUs that support FMA instructions (e.g., AArch64), these methods can be faster. However, without native hardware support, the compiler may fall back to a slow software library call (fmaf).

  • Numerical Differences: FMA computes a * b + c with a single rounding operation. This means the results from an FMA-based method are not guaranteed to be bit-for-bit identical to the standard methods.

Implementation

This Compiler Explorer link clearly illustrates the performance dichotomy between architectures and compiler settings: https://godbolt.org/z/joW4eqvT9

If this approach is ok, I would be happy to implement it.

zhongyi51 avatar Jul 26 '25 03:07 zhongyi51