StaticArrays.jl icon indicating copy to clipboard operation
StaticArrays.jl copied to clipboard

result of A\b is different with StaticArrays

Open MariusDrulea opened this issue 3 years ago • 14 comments

While running my program I have found x=A\b sometimes gives a different result when using StaticArrays. A is a 3x3 Matrix and b is a 3x1 Matrix.

A = [3.1916757882237305e11 7.90810107400456e10 798958.7959618106; 7.90810107400456e10 1.959448361045036e10 197960.17301454322; 798958.7959618106 197960.17301454322 2.0] 

b = [-390.11659959072784; -96.66024072975743; -0.0009765625]

Solving A\b with default Matrices I get:

x =[1.2650904540083241e-20; -3.9907885656377556e-31; -0.0004882812500050538]
err = norm(A*x -b) gives 0

Solving A\b with StaticArrays I get:

x = [-1.222243483259749e-9; -3.79355666251685e-9; -0.0004100774669296476]
err = norm(A*x-b) gives 646.5957086789485

I suspect the version with default Matrices is the correct result, and there is something missing in A\b with StaticArrays.

MariusDrulea avatar Sep 07 '21 16:09 MariusDrulea

Yes, LinearAlgebra is correct. StaticArrays often works quite poorly with ill-conditioned matrices like yours. You'd have to replace the current solver in StaticArrays (there is a custom one for exactly this case) with a more numerically stable one.

mateuszbaran avatar Sep 07 '21 18:09 mateuszbaran

@mateuszbaran indeed, the matrix is nearly singular. If I exclude such matrices (needed), I get quite the same results in my program.

MariusDrulea avatar Sep 07 '21 21:09 MariusDrulea

I had a look over the implementation of A\b, and it is clearly good enough for most purposes.

@inline function _solve(::Size{(3,3)}, ::Size{(3,)}, a::StaticMatrix{<:Any, <:Any, Ta}, b::StaticVector{<:Any, Tb}) where {Ta, Tb}
    d = det(a)
    T = typeof((one(Ta)*zero(Tb) + one(Ta)*zero(Tb))/d)
    @inbounds return similar_type(b, T)(
        ((a[2,2]*a[3,3] - a[2,3]*a[3,2])*b[1] +
            (a[1,3]*a[3,2] - a[1,2]*a[3,3])*b[2] +
            (a[1,2]*a[2,3] - a[1,3]*a[2,2])*b[3]) / d,
        ((a[2,3]*a[3,1] - a[2,1]*a[3,3])*b[1] +
            (a[1,1]*a[3,3] - a[1,3]*a[3,1])*b[2] +
            (a[1,3]*a[2,1] - a[1,1]*a[2,3])*b[3]) / d,
        ((a[2,1]*a[3,2] - a[2,2]*a[3,1])*b[1] +
            (a[1,2]*a[3,1] - a[1,1]*a[3,2])*b[2] +
            (a[1,1]*a[2,2] - a[1,2]*a[2,1])*b[3]) / d )
end

MariusDrulea avatar Sep 07 '21 21:09 MariusDrulea

Shouldn't this issue have remained opened? It's not a terribly urgent thing but might as well track that `()(::StaticArray{3,3}, ::StaticVector{3}) is unstable for ill-conditioned matrices and could potentially be improved.

thchr avatar Oct 06 '21 18:10 thchr

Can this issue be reopened?

Another example:

julia> A = SMatrix{3,3}([2.36923    2.31888   -0.472987
         2.31888    2.26959   -0.462935
        -0.472987  -0.462935   0.094426])
julia> B = inv(A)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 2.25519e5       1.44486e5  1.838e6
 1.44486e5  -92980.3        2.67896e5
 1.838e6         2.67896e5  1.05201e7
julia> B * A
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
  1.0           2.21114e-5  -4.51009e-6
  1.52382e-5    0.999992    -3.04212e-6
 -0.000234218  -0.00022924   1.00002

The matrix A is badly conditioned. But with normal matrices the inverse is much better behaved:

julia> A = [2.36923    2.31888   -0.472987
         2.31888    2.26959   -0.462935
        -0.472987  -0.462935   0.094426]
julia> B = inv(A)
3×3 Matrix{Float64}:
 2.25524e5       1.4449e5   1.83804e6
 1.4449e5   -92982.4        2.67902e5
 1.83804e6       2.67902e5  1.05203e7
julia> B * A
3×3 Matrix{Float64}:
  1.0          1.16415e-10  0.0
 -7.27596e-11  1.0          7.27596e-12
  0.0          0.0          1.0

3f6a avatar Feb 14 '24 07:02 3f6a

Yes, let's re-open it, maybe there is a better algorithm for this case.

mateuszbaran avatar Feb 14 '24 07:02 mateuszbaran

The algorithm above is how we tell our linear algebra students NOT to solve a linear system. You need to use a pivoted LU factorization here (or other factorization if the matrix has some structure). Is there a reason you can’t call LAPACK (at least for the supported floating-point types)?

dpo avatar Feb 15 '24 18:02 dpo

The reason for not calling into LAPACK is speed. The whole point of the package is to exploit the knowledge about the size of the matrices (and their storage on the stack) to speed up basic operations for small matrix sizes.

I.e. whatever replaces this should probably still be hand-rolled and not call generic matrix libraries.

thchr avatar Feb 15 '24 18:02 thchr

As it stands, the current implementation produces wrong results, it doesn't matter how fast it is.

Since I did not know this, I lost a couple of days tracking numerical issues, until I realized StaticArrays was guilty.

I'd suggest to just use LAPACK for now, even though it leads to allocations and it's slow. It will be correct, at least.

I.e. whatever replaces this should probably still be hand-rolled and not call generic matrix libraries.

This looks difficult and I guess it will take some time to get right. In the mean time, a correct (though slow) implementation is preferable.

3f6a avatar Feb 15 '24 23:02 3f6a

Some people complain when StaticArrays sacrifices accuracy for speed, other people complain when it sacrifices speed for accuracy. StaticArrays can't meet everyone's needs. We are aware StaticArrays doesn't always use the most numerically stable algorithms and when in doubt, you should always check against LinearAlgebra implementations. I'm not strictly against changing to LAPACK but I'd like to see wider support for such change.

In the mean time, a correct (though slow) implementation is preferable.

You can always define linsolve(A::SMatrix{N,N}, b::SVector{N}) where {N} = SVector{N}(Matrix(A) \ Vector(b))

mateuszbaran avatar Feb 16 '24 08:02 mateuszbaran

I don't think calling out to LAPACK is a good idea; it'd be a huge performance gotcha for the vast majority of cases.

Potentially, hand-rolled Gauss-Jordan elimination or similar could be used, which doesn't have quite the same performance drop as LAPACK (~10x vs. ~100x). For context, here's a direct port from OpenEXR's gjInverse (can probably be optimized a bunch; e.g., the loops could be unrolled completely potentially):

function gauss_inv(A::StaticMatrix{N,N,T}) where {N,T}
    s = Size(A)
    B = (isbitstype(T) ?
         StaticArrays.mutable_similar_type(T, s, Val{2}) :
         StaticArrays.sizedarray_similar_type(T, s, Val{2}))(I)
    A′ = copyto!(similar(A), A)

    # gauss-jordan, forward elimination
    for i in SOneTo(N)
        pivot = i
        pivotsize = abs(A′[i,i])
        iszero(pivotsize) && error("matrix is singular; an inverse does not exist")
        for j = i+1:N
            tmp = abs(A′[j,i])
            if tmp > pivotsize
                pivot = j
                pivotsize = tmp
            end
        end
        if pivot ≠ i
            for j in SOneTo(N)
                tmp = A′[i,j]
                A′[i,j] = A′[pivot,j]
                A′[pivot,j] = tmp
                
                tmp = B[i,j]
                B[i,j] = B[pivot,j]
                B[pivot,j] = tmp
            end
        end
        for j in i+1:N
            f = A′[j,i]/A′[i,i]
            for k in SOneTo(N)
                A′[j,k] -= f*A′[i,k]
                B[j,k] -= f*B[i,k]
            end
        end
    end
    
    # backward substitution
    for i in N:-1:1
        f = A′[i,i]
        iszero(f) && error("matrix is singular; an inverse does not exist")
        f⁻¹ = inv(f)
        for j in SOneTo(N)
            A′[i,j] *= f⁻¹
            B[i,j] *= f⁻¹
        end
        for j in 1:i-1
            f = A′[j,i]
            for k in SOneTo(N)
                A′[j,k] -= f*A′[i,k]
                B[j,k] -= f*B[i,k]
            end
        end
    end

    return typeof(A)(B)
end

For comparison:

A = SMatrix{3,3}([2.36923    2.31888   -0.472987
                2.31888    2.26959   -0.462935
               -0.472987  -0.462935   0.094426])
lapack_inv(A) = typeof(A)(inv(Matrix(A)))

@btime lapack_inv($A); # 419.095 ns (5 allocations: 1.98 KiB)
@btime gauss_inv($A);  # 53.198 ns (1 allocation: 80 bytes)
@btime inv($A);        # 4.900 ns (0 allocations: 0 bytes)

norm(lapack_inv(A)*A - I) # 5.30685213034491e-10
norm(gauss_inv(A)*A - I)  # 6.4296072425034e-10
norm(inv(A)*A - I)        # 0.00032985247474438296

But... still, 10x slowdown is a lot. Maybe there's something better? In any case, I don't think LAPACK would be a good solution.

thchr avatar Feb 16 '24 10:02 thchr

I agree that LAPACK isn't ideal for this, as it's usually optimised for large matrix sizes. However, we should still try to prioritise accuracy over performance, and it sounds reasonable to me to have our own optimised factorisation implementations.

jishnub avatar Feb 16 '24 11:02 jishnub

@thchr I've added some @inbounds to your code and it removed the allocation, roughly halving the runtime. Also note that for \ we could have a more optimized Gaussian elimination than for the inverse.

mateuszbaran avatar Feb 16 '24 12:02 mateuszbaran

Ah, the fallback for larger StaticMatrix in fact already goes via an LU approach, using StaticArrays-specific LU factorization (since https://github.com/JuliaArrays/StaticArrays.jl/pull/424) - so the following is pretty much as fast as gauss_inv (but gauss_inv can be made ~30% faster by adding @inbounds):

@inline function lu_inv(A::StaticMatrix)
    s = Size(A)
    invoke(StaticArrays._inv, Tuple{Size{S} where S, typeof(A)}, s, A)
end

@btime lu_inv($A)     # 49.089 ns (0 allocations: 0 bytes)
norm(lu_inv(A)*A - I) # 5.864757423319422e-10

thchr avatar Feb 16 '24 12:02 thchr