Quadmath.jl icon indicating copy to clipboard operation
Quadmath.jl copied to clipboard

Float128 fma() subtly corrupts state on Windows

Open RalphAS opened this issue 6 years ago • 14 comments
trafficstars

On Windows (but not Linux or OSX) calls to fma() somehow break the overflow handling.

julia> h, x = floatmax(Float128), Float128(1)
julia> h + h
inf
julia> fma(x,x,x);
julia> h + h
1.18973149535723176508575932662800701e+4932

I don't see anything wrong with our wrappers; it may be a libquadmath and/or libgcc issue.

RalphAS avatar Apr 20 '19 21:04 RalphAS

Is it correct for other values? Not sure if this is related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89459

simonbyrne avatar Apr 21 '19 05:04 simonbyrne

After calling fma on Float128, overflow seems generally broken for +,-,*,/ with Float128 or Float64 arguments (tested with mixtures of huge and moderate values). I surmise that some floating point control structure in the mingw runtime is not being reset properly.

Since things work on Linux, I don't think this one is an upstream GCC issue. (I believe I'm using a fairly old GCC on Linux without trouble here.)

RalphAS avatar Apr 21 '19 18:04 RalphAS

ah, if it's breaking Float64 rounding, then I suspect that is related to https://github.com/JuliaLang/julia/issues/9847#issuecomment-71174579 / https://sourceware.org/bugzilla/show_bug.cgi?id=17907

In that case, maybe we should just disable fma on Windows? Or call the MPFR one?

simonbyrne avatar Apr 22 '19 02:04 simonbyrne

I'm really here for the trouble it gave me in #30 and the challenge of a level-1 puzzle to pin that down, so either of your suggestions LGTM.

RalphAS avatar Apr 22 '19 04:04 RalphAS

Maybe we just remove fma on Windows for the time being?

simonbyrne avatar Apr 22 '19 17:04 simonbyrne

I have disabled fma on Windows for now. Once we have a better solution we can add it back.

simonbyrne avatar Apr 24 '19 04:04 simonbyrne

Opinion?

@static if Sys.iswindows()
    const SPLITTER = 1 + Float128(2)^57
    const THRESHOLD = Float128(2)^16326

    function two_sum(x::Float128, y::Float128)
        hi = x + y
        a  = hi - y
        b  = hi - a
        lo = (x - a) + (y - b)
        return hi, lo
    end
    
    function sum_three(x::Float128, y::Float128, z::Float128)
        s, t  = two_sum(y, z)
        u, v  = two_sum(x, s)
        hi    = t + v + u
        return hi
    end
    
    function splitter(x::Float128)
        x < THRESHOLD || throw(DomainError("Threshold $THRESHOLD exceeded"))
        a = x * SPLITTER
        b = x - a
        x1 = a + b
        x2 = x - x1
        return x1, x2
    end
    
    function two_prod(x::Float128, y::Float128)
        hi = x * y
        x1, x2 = splitter(x)
        y1, y2 = splitter(y)
        lo = x2 * y2 - (((hi - x1*y1) - x1*y2) - x2*y1)
        return hi, lo
    end 
 
    function fma(x::Float128, y::Float128, z::Float128)
        xyhi, xylo = two_prod(x, y)
        result = sum_three(z, xyhi, xylo)
        return result
    end
    
 end

JeffreySarnoff avatar May 03 '19 04:05 JeffreySarnoff

fmas are subtly difficult to get correct (especially dealing with underflow and subnormals). I think the BigFloat option is probably the best for now until we can figure out what is going on.

simonbyrne avatar May 03 '19 17:05 simonbyrne

It's fine by me -- the more important aspect of this is let the Windows version work for clients who use fma. Why not implement the BigFloat fma stopgap until the cause is dissolved or resolved?

JeffreySarnoff avatar May 08 '19 01:05 JeffreySarnoff

Thought it worth the revisit

if Sys.iswindows()
    function Base.fma(x::Float128, y::Float128, z::Float128)
        oldprec = precision(BigFloat)
        setprecision(BigFloat, 512)   # at least 448 == Int(128 * 3.5) 
        result = Float128(x * y + z)
        setprecision(BigFloat, oldprec)
        return result
    end
else
   # ccall
end

JeffreySarnoff avatar Jun 14 '20 11:06 JeffreySarnoff

@JeffreySarnoff would you mind opening a pull request?

Isn't fma defined for BigFloat?

simonbyrne avatar Jun 15 '20 15:06 simonbyrne

fma is defined for BigFloat and the logic will process fma(x,y,z) at the precision set for BigFloat. That precision may be well in excess of the precision demanded for Float128s and take longer than it might, or it may be much less than that and deliver an imprecise result. Either way, we need to manipulate setprecision(BigFloat) when fma is called (for Win). I'll PR it.

JeffreySarnoff avatar Jun 15 '20 17:06 JeffreySarnoff

Has this "On Windows (but not Linux or OSX) calls to fma() somehow break the overflow handling." fixed itself with the current libquadmath that we use? If so, we should remove the code that blocks fma on Windows.

JeffreySarnoff avatar Apr 16 '22 15:04 JeffreySarnoff

It appears to be unchanged for current and nightly Julia.

The example at the top is included as a broken test for Windows in the commit shown here: https://github.com/RalphAS/Quadmath.jl/actions/runs/2178209027

RalphAS avatar Apr 16 '22 23:04 RalphAS