Krylov.jl
Krylov.jl copied to clipboard
Add `ktypeof` support for view arrays
Hello,
I'm trying to calculate the eigenvalues of a sparse matrix with the arnoldi method written in this package, taking advantage of the bicgstab
function to do the shift-inverse algorithm, and finding the lowest eigenvalues.
I'm already able to do it with a modified version of the KrylovKit package, which has the eigsolve
function, which supports any abstract linear map. And with this implementation I'm able to speed up the calculation by a factor 10 or more against the Arpack package, which is very useful for calculating large sparse matrices, also for GPUs.
With this package, I went through an error when calculating the Hessenberg matrix through the arnoldi
function.
using LinearAlgebra
using SparseArrays
using IncompleteLU
using Krylov
using LinearOperators
N = 500
A = sprand(ComplexF64, N, N, 3 / N)
A *= A'
A = triu(A, 1) + sprand(ComplexF64, N, N, 0.5 / N)
μ = 1
A = A - μ * I # SHIFT
P = ilu(A, τ=0.001)
b = rand(ComplexF64, size(A, 1))
solver = BicgstabSolver(N, N, typeof(b))
# INVERSE MAP
Map = LinearOperator(eltype(b), N, N, false, false, (y, x) -> copyto!(y, bicgstab!(solver, A, x, M = P, ldiv = true).x))
arnoldi(Map, b, 20)
ERROR: MethodError: no method matching ktypeof(::Matrix{ComplexF64})
Closest candidates are:
ktypeof(::S) where S<:(DenseVector) at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:203
ktypeof(::S) where S<:SparseVector at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:207
ktypeof(::S) where S<:(AbstractSparseVector) at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:212
...
Stacktrace:
[1] ktypeof(v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ Krylov ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:222
[2] bicgstab!(solver::BicgstabSolver{Float64, ComplexF64, Vector{ComplexF64}}, A::SparseMatrixCSC{ComplexF64, Int64}, b::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}; c::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, M::IncompleteLU.ILUFactorization{ComplexF64, Int64}, N::UniformScaling{Bool}, atol::Float64, rtol::Float64, itmax::Int64, verbose::Int64, history::Bool, ldiv::Bool, callback::Krylov.var"#232#234")
@ Krylov ~/.julia/packages/Krylov/0uvPR/src/bicgstab.jl:118
[3] (::var"#13#14")(y::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, x::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ Main ./Untitled-1:195
[4] prod3!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, prod!::var"#13#14", v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, α::ComplexF64, β::ComplexF64, Mv5::Vector{ComplexF64})
@ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:12
[5] mul!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, op::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, α::ComplexF64, β::ComplexF64)
@ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:31
[6] mul!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, op::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
@ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:36
[7] arnoldi(A::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, b::Vector{ComplexF64}, k::Int64)
@ Krylov ~/.julia/packages/Krylov/0uvPR/src/krylov_processes.jl:210
[8] top-level scope
@ Untitled-1:197
I think that the problem is related to the use of the view
function
https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/c88e3e2a341cc7de3d2a19bbaa5d8dc12ea72652/src/krylov_processes.jl#L203-L210
Indeed, the multiplication mul!(q, A, vᵢ)
calls the bicgstab solver function
https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/a42db8e18a251af99df08ca1c1a748909c9132cc/src/krylov_solvers.jl#L876-L880
but here b
is a SubArray whose parent is a Matrix. And that is the error.
Hi @albertomercurio!
Good catch, I need to update my function ktypeof
for a SubArray of a matrix. (https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/main/src/krylov_utils.jl#L221-L223).
I will fix it tomorrow but you could use this temporary fix for today:
import Krylov.ktypeof
function ktypeof(v::S) where S <: SubArray
return Vector{eltype(v)} # or CuVector{eltype(v)} for your case
end
If you use arnoldi
on GPUs, the Krylov basis V
is a CuMatrix
but the Hessenberg matrix is always stored as a SparseMatrixCSC
for information.
Thank you, it works now.
I was wondering to obtain the eigenvalues using the implicit restarted Arnoldi iteration. I'm currently able to find them by diagonalizing the Hessenberg matrix with the same dimension of the matrix itself, which is not useful for large matrices, but it is a good starting point to test the framework.
n = 25
vals = Array(1:n)
vecs = rand(n, n)
A = vecs \ diagm(vals) * vecs # A has the eigenvalues 1, 2, ..., 25
b = rand(n)
krylovdim = 25
V, H = arnoldi(A, b, krylovdim)
vals2 = eigvals(Array(H[1:end-1,:]))
sum(abs.(vals2 - eigvals(A)).^2)
6.88969212857728e-12
However, for large matrices it is useful to calculate with lower krylov dimensions. In this case, if I calculate with krylovdim = 10
for example, the single Arnoldi iteration is not sufficient to obtain accurately the largest 10 eigenvalues.
Do you know how to implement the implicit restarted Arnoldi iterations with the Krylov.jl framework?
I was asking why whe need to set the H matrix as a sparse CSC matrix. I figure out that it should be better to have a dense matrix for the eigenvalues problem. Here you can find the implicit restarted Arnoldi method, however it is approximately 6 times slower than KrylovKit eigsolve function, but I think it can be significantly improved.
6 is exactly the number of threads of my laptop, and I think that the KrylovKit version is faster only because it uses the multithreading.
using LinearAlgebra
using SparseArrays
using Krylov
using Krylov: @knrm2, @kaxpy!, @kdot, FloatOrComplex, ktypeof, vector_to_matrix
mutable struct Arnoldi
A
k::Int
V::AbstractArray
H::AbstractArray
resid::Vector
end
function Arnoldi(A, b::AbstractVector{FC}, k::Int) where FC <: FloatOrComplex
m, n = size(A)
S = ktypeof(convert.(ComplexF64, b))
M = vector_to_matrix(S)
V = M(undef, n, k+1)
H = zeros(eltype(M), k+1, k)
resid = ones(Float64, k)
for i = 1:k
vᵢ = view(V,:,i)
vᵢ₊₁ = q = view(V,:,i+1)
if i == 1
β = @knrm2(n, b)
vᵢ .= b ./ β
end
mul!(q, A, vᵢ)
for j = 1:i
vⱼ = view(V,:,j)
H[j,i] = @kdot(n, vⱼ, q)
@kaxpy!(n, -H[j,i], vⱼ, q)
end
H[i+1,i] = @knrm2(n, q)
vᵢ₊₁ .= q ./ H[i+1,i]
end
return Arnoldi(A, k, V, H, resid)
end
function expand_arnoldi!(arn::Arnoldi, p::Int, k::Int)
k < p && throw(DomainError(k-p, "k must to be bigger than p."))
p <= 1 && throw(DomainError(p, "p must to be bigger than 1."))
A = arn.A
V = arn.V
H = arn.H
m, n = size(A)
for i = p:k
vᵢ = view(V,:,i)
vᵢ₊₁ = q = view(V,:,i+1)
mul!(q, A, vᵢ)
for j = 1:i
vⱼ = view(V,:,j)
H[j,i] = @kdot(n, vⱼ, q)
@kaxpy!(n, -H[j,i], vⱼ, q)
end
H[i+1,i] = @knrm2(n, q)
vᵢ₊₁ .= q ./ H[i+1,i]
end
return arn
end
function eigsolve(A, v₀, p::Int = min(4, size(A, 1)), m::Int = min(10, size(A, 1)); tol::Real = 1e-10, maxiter::Int = 200)
S = ktypeof(convert.(ComplexF64, v₀))
M = vector_to_matrix(S)
my_arnoldi = Arnoldi(A, v₀, m)
V, H, resid = my_arnoldi.V, my_arnoldi.H, my_arnoldi.resid
iter = 1
while iter < maxiter && sum(my_arnoldi.resid[end-p+1:end] .< tol) < p
Vₘ = view(V, :, 1:m)
Hₘ = view(H, 1:m, :)
qₘ = view(V, :, m+1)
βeₘ = view(H, m+1, :)'
# A * Vₘ ≈ Vₘ * M(Hₘ) + qₘ * M(βeₘ) == true
μₖ, y = eigen(Hₘ)
idxs = sortperm(abs.(μₖ))
μₖ = μₖ[idxs][1:m-p]
y = y[:, idxs]
copyto!(resid, map(i->abs(βeₘ * y[:, i]), 1:m))
for μᵢ in μₖ
Q, R = qr(Hₘ - μᵢ * I)
Hₘ = Q' * Hₘ * Q
Vₘ *= M(Q)
βeₘ *= Q
end
Vₚ = view(Vₘ, :, 1:p)
Hₚ = view(Hₘ, 1:p, 1:p)
βeₚ = reshape(view(βeₘ, 1, 1:p), 1, p)
# A * Vₚ ≈ Vₚ * M(Hₚ) + qₘ * M(βeₚ) == true
copyto!(V, 0 * V)
copyto!(H, 0 * H)
copyto!(view(V, :, 1:p), Vₚ)
copyto!(view(V, :, p+1), qₘ)
copyto!(view(H, 1:p, 1:p), Hₚ)
copyto!(view(H, p+1, 1:p), βeₚ)
expand_arnoldi!(my_arnoldi, p, m)
iter+=1
end
n_converged = sum(my_arnoldi.resid[end-p+1:end] .< tol)
Vₘ = view(V, :, 1:m)
Hₘ = view(H, 1:m, :)
μ, y = eigen(Hₘ)
idxs = sortperm(abs.(μₖ))
μ = μ[idxs][m-n_converged+1:m]
y = M(y)[:, idxs]
x = hcat(map(i->Vₘ * y[:, i], m-n_converged+1:m)...)
return μ, x, (iter, my_arnoldi.resid)
end
n = 50
vals = Array(1:n)
vecs = rand(n, n)
A = vecs \ Array(diagm(vals)) * vecs
b = rand(n)
E, U, info = eigsolve(A, b, 4, 10)
Converged after 23 iterations.
It should still support CUDA arrays.
You need to replace
idxs = sortperm(abs.(μₖ))
by
idxs = sortperm(abs.(μ))
at the end of your function eigsolve
.
For the implicit restarted Arnoldi iterations
, what reference(s) do you recommend?
Is the article of R. B. Lehoucq and D. C. Sorensen a good one?
I followed this chapter book, which is very clear.
This article is also good, they make a sligthly different version which should be faster.
Here they make an adaptive implicit restarted Arnoldi iteration. And they declare a speedup of 6 times.
Thanks @albertomercurio!
I think we can close this issue, since it is solved.