StaticArrays.jl
StaticArrays.jl copied to clipboard
result of A\b is different with StaticArrays
While running my program I have found x=A\b sometimes gives a different result when using StaticArrays. A is a 3x3 Matrix and b is a 3x1 Matrix.
A = [3.1916757882237305e11 7.90810107400456e10 798958.7959618106; 7.90810107400456e10 1.959448361045036e10 197960.17301454322; 798958.7959618106 197960.17301454322 2.0]
b = [-390.11659959072784; -96.66024072975743; -0.0009765625]
Solving A\b with default Matrices I get:
x =[1.2650904540083241e-20; -3.9907885656377556e-31; -0.0004882812500050538]
err = norm(A*x -b) gives 0
Solving A\b with StaticArrays I get:
x = [-1.222243483259749e-9; -3.79355666251685e-9; -0.0004100774669296476]
err = norm(A*x-b) gives 646.5957086789485
I suspect the version with default Matrices is the correct result, and there is something missing in A\b with StaticArrays.
Yes, LinearAlgebra
is correct. StaticArrays often works quite poorly with ill-conditioned matrices like yours. You'd have to replace the current solver in StaticArrays (there is a custom one for exactly this case) with a more numerically stable one.
@mateuszbaran indeed, the matrix is nearly singular. If I exclude such matrices (needed), I get quite the same results in my program.
I had a look over the implementation of A\b, and it is clearly good enough for most purposes.
@inline function _solve(::Size{(3,3)}, ::Size{(3,)}, a::StaticMatrix{<:Any, <:Any, Ta}, b::StaticVector{<:Any, Tb}) where {Ta, Tb}
d = det(a)
T = typeof((one(Ta)*zero(Tb) + one(Ta)*zero(Tb))/d)
@inbounds return similar_type(b, T)(
((a[2,2]*a[3,3] - a[2,3]*a[3,2])*b[1] +
(a[1,3]*a[3,2] - a[1,2]*a[3,3])*b[2] +
(a[1,2]*a[2,3] - a[1,3]*a[2,2])*b[3]) / d,
((a[2,3]*a[3,1] - a[2,1]*a[3,3])*b[1] +
(a[1,1]*a[3,3] - a[1,3]*a[3,1])*b[2] +
(a[1,3]*a[2,1] - a[1,1]*a[2,3])*b[3]) / d,
((a[2,1]*a[3,2] - a[2,2]*a[3,1])*b[1] +
(a[1,2]*a[3,1] - a[1,1]*a[3,2])*b[2] +
(a[1,1]*a[2,2] - a[1,2]*a[2,1])*b[3]) / d )
end
Shouldn't this issue have remained opened? It's not a terribly urgent thing but might as well track that `()(::StaticArray{3,3}, ::StaticVector{3}) is unstable for ill-conditioned matrices and could potentially be improved.
Can this issue be reopened?
Another example:
julia> A = SMatrix{3,3}([2.36923 2.31888 -0.472987
2.31888 2.26959 -0.462935
-0.472987 -0.462935 0.094426])
julia> B = inv(A)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
2.25519e5 1.44486e5 1.838e6
1.44486e5 -92980.3 2.67896e5
1.838e6 2.67896e5 1.05201e7
julia> B * A
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
1.0 2.21114e-5 -4.51009e-6
1.52382e-5 0.999992 -3.04212e-6
-0.000234218 -0.00022924 1.00002
The matrix A
is badly conditioned. But with normal matrices the inverse is much better behaved:
julia> A = [2.36923 2.31888 -0.472987
2.31888 2.26959 -0.462935
-0.472987 -0.462935 0.094426]
julia> B = inv(A)
3×3 Matrix{Float64}:
2.25524e5 1.4449e5 1.83804e6
1.4449e5 -92982.4 2.67902e5
1.83804e6 2.67902e5 1.05203e7
julia> B * A
3×3 Matrix{Float64}:
1.0 1.16415e-10 0.0
-7.27596e-11 1.0 7.27596e-12
0.0 0.0 1.0
Yes, let's re-open it, maybe there is a better algorithm for this case.
The algorithm above is how we tell our linear algebra students NOT to solve a linear system. You need to use a pivoted LU factorization here (or other factorization if the matrix has some structure). Is there a reason you can’t call LAPACK (at least for the supported floating-point types)?
The reason for not calling into LAPACK is speed. The whole point of the package is to exploit the knowledge about the size of the matrices (and their storage on the stack) to speed up basic operations for small matrix sizes.
I.e. whatever replaces this should probably still be hand-rolled and not call generic matrix libraries.
As it stands, the current implementation produces wrong results, it doesn't matter how fast it is.
Since I did not know this, I lost a couple of days tracking numerical issues, until I realized StaticArrays was guilty.
I'd suggest to just use LAPACK for now, even though it leads to allocations and it's slow. It will be correct, at least.
I.e. whatever replaces this should probably still be hand-rolled and not call generic matrix libraries.
This looks difficult and I guess it will take some time to get right. In the mean time, a correct (though slow) implementation is preferable.
Some people complain when StaticArrays sacrifices accuracy for speed, other people complain when it sacrifices speed for accuracy. StaticArrays can't meet everyone's needs. We are aware StaticArrays doesn't always use the most numerically stable algorithms and when in doubt, you should always check against LinearAlgebra implementations. I'm not strictly against changing to LAPACK but I'd like to see wider support for such change.
In the mean time, a correct (though slow) implementation is preferable.
You can always define linsolve(A::SMatrix{N,N}, b::SVector{N}) where {N} = SVector{N}(Matrix(A) \ Vector(b))
I don't think calling out to LAPACK is a good idea; it'd be a huge performance gotcha for the vast majority of cases.
Potentially, hand-rolled Gauss-Jordan elimination or similar could be used, which doesn't have quite the same performance drop as LAPACK (~10x vs. ~100x). For context, here's a direct port from OpenEXR's gjInverse
(can probably be optimized a bunch; e.g., the loops could be unrolled completely potentially):
function gauss_inv(A::StaticMatrix{N,N,T}) where {N,T}
s = Size(A)
B = (isbitstype(T) ?
StaticArrays.mutable_similar_type(T, s, Val{2}) :
StaticArrays.sizedarray_similar_type(T, s, Val{2}))(I)
A′ = copyto!(similar(A), A)
# gauss-jordan, forward elimination
for i in SOneTo(N)
pivot = i
pivotsize = abs(A′[i,i])
iszero(pivotsize) && error("matrix is singular; an inverse does not exist")
for j = i+1:N
tmp = abs(A′[j,i])
if tmp > pivotsize
pivot = j
pivotsize = tmp
end
end
if pivot ≠ i
for j in SOneTo(N)
tmp = A′[i,j]
A′[i,j] = A′[pivot,j]
A′[pivot,j] = tmp
tmp = B[i,j]
B[i,j] = B[pivot,j]
B[pivot,j] = tmp
end
end
for j in i+1:N
f = A′[j,i]/A′[i,i]
for k in SOneTo(N)
A′[j,k] -= f*A′[i,k]
B[j,k] -= f*B[i,k]
end
end
end
# backward substitution
for i in N:-1:1
f = A′[i,i]
iszero(f) && error("matrix is singular; an inverse does not exist")
f⁻¹ = inv(f)
for j in SOneTo(N)
A′[i,j] *= f⁻¹
B[i,j] *= f⁻¹
end
for j in 1:i-1
f = A′[j,i]
for k in SOneTo(N)
A′[j,k] -= f*A′[i,k]
B[j,k] -= f*B[i,k]
end
end
end
return typeof(A)(B)
end
For comparison:
A = SMatrix{3,3}([2.36923 2.31888 -0.472987
2.31888 2.26959 -0.462935
-0.472987 -0.462935 0.094426])
lapack_inv(A) = typeof(A)(inv(Matrix(A)))
@btime lapack_inv($A); # 419.095 ns (5 allocations: 1.98 KiB)
@btime gauss_inv($A); # 53.198 ns (1 allocation: 80 bytes)
@btime inv($A); # 4.900 ns (0 allocations: 0 bytes)
norm(lapack_inv(A)*A - I) # 5.30685213034491e-10
norm(gauss_inv(A)*A - I) # 6.4296072425034e-10
norm(inv(A)*A - I) # 0.00032985247474438296
But... still, 10x slowdown is a lot. Maybe there's something better? In any case, I don't think LAPACK would be a good solution.
I agree that LAPACK isn't ideal for this, as it's usually optimised for large matrix sizes. However, we should still try to prioritise accuracy over performance, and it sounds reasonable to me to have our own optimised factorisation implementations.
@thchr I've added some @inbounds
to your code and it removed the allocation, roughly halving the runtime. Also note that for \
we could have a more optimized Gaussian elimination than for the inverse.
Ah, the fallback for larger StaticMatrix
in fact already goes via an LU approach, using StaticArrays-specific LU factorization (since https://github.com/JuliaArrays/StaticArrays.jl/pull/424) - so the following is pretty much as fast as gauss_inv
(but gauss_inv
can be made ~30% faster by adding @inbounds
):
@inline function lu_inv(A::StaticMatrix)
s = Size(A)
invoke(StaticArrays._inv, Tuple{Size{S} where S, typeof(A)}, s, A)
end
@btime lu_inv($A) # 49.089 ns (0 allocations: 0 bytes)
norm(lu_inv(A)*A - I) # 5.864757423319422e-10