Krylov.jl Add `ktypeof` support for view arrays

Hello,

I'm trying to calculate the eigenvalues of a sparse matrix with the arnoldi method written in this package, taking advantage of the bicgstab function to do the shift-inverse algorithm, and finding the lowest eigenvalues. I'm already able to do it with a modified version of the KrylovKit package, which has the eigsolve function, which supports any abstract linear map. And with this implementation I'm able to speed up the calculation by a factor 10 or more against the Arpack package, which is very useful for calculating large sparse matrices, also for GPUs.

With this package, I went through an error when calculating the Hessenberg matrix through the arnoldi function.

using LinearAlgebra
using SparseArrays
using IncompleteLU
using Krylov
using LinearOperators

N = 500
A = sprand(ComplexF64, N, N, 3 / N)
A *= A'
A = triu(A, 1) + sprand(ComplexF64, N, N, 0.5 / N)

μ = 1
A = A - μ * I # SHIFT
P = ilu(A, τ=0.001)

b = rand(ComplexF64, size(A, 1))

solver = BicgstabSolver(N, N, typeof(b))

# INVERSE MAP
Map = LinearOperator(eltype(b), N, N, false, false, (y, x) -> copyto!(y, bicgstab!(solver, A, x, M = P, ldiv = true).x))

arnoldi(Map, b, 20)

ERROR: MethodError: no method matching ktypeof(::Matrix{ComplexF64})
Closest candidates are:
  ktypeof(::S) where S<:(DenseVector) at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:203
  ktypeof(::S) where S<:SparseVector at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:207
  ktypeof(::S) where S<:(AbstractSparseVector) at ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:212
  ...
Stacktrace:
 [1] ktypeof(v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ Krylov ~/.julia/packages/Krylov/0uvPR/src/krylov_utils.jl:222
 [2] bicgstab!(solver::BicgstabSolver{Float64, ComplexF64, Vector{ComplexF64}}, A::SparseMatrixCSC{ComplexF64, Int64}, b::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}; c::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, M::IncompleteLU.ILUFactorization{ComplexF64, Int64}, N::UniformScaling{Bool}, atol::Float64, rtol::Float64, itmax::Int64, verbose::Int64, history::Bool, ldiv::Bool, callback::Krylov.var"#232#234")
   @ Krylov ~/.julia/packages/Krylov/0uvPR/src/bicgstab.jl:118
 [3] (::var"#13#14")(y::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, x::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ Main ./Untitled-1:195
 [4] prod3!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, prod!::var"#13#14", v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, α::ComplexF64, β::ComplexF64, Mv5::Vector{ComplexF64})
   @ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:12
 [5] mul!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, op::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, α::ComplexF64, β::ComplexF64)
   @ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:31
 [6] mul!(res::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, op::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, v::SubArray{ComplexF64, 1, Matrix{ComplexF64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ LinearOperators ~/.julia/packages/LinearOperators/QYye7/src/operations.jl:36
 [7] arnoldi(A::LinearOperator{ComplexF64, Int64, var"#13#14", Nothing, Nothing, Vector{ComplexF64}}, b::Vector{ComplexF64}, k::Int64)
   @ Krylov ~/.julia/packages/Krylov/0uvPR/src/krylov_processes.jl:210
 [8] top-level scope
   @ Untitled-1:197

I think that the problem is related to the use of the view function

https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/c88e3e2a341cc7de3d2a19bbaa5d8dc12ea72652/src/krylov_processes.jl#L203-L210

Indeed, the multiplication mul!(q, A, vᵢ) calls the bicgstab solver function

https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/a42db8e18a251af99df08ca1c1a748909c9132cc/src/krylov_solvers.jl#L876-L880

but here b is a SubArray whose parent is a Matrix. And that is the error.

Oct 16 '22 12:10 albertomercurio

Hi @albertomercurio! Good catch, I need to update my function ktypeof for a SubArray of a matrix. (https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/main/src/krylov_utils.jl#L221-L223).

I will fix it tomorrow but you could use this temporary fix for today:

import Krylov.ktypeof
function ktypeof(v::S) where S <: SubArray
  return Vector{eltype(v)}   # or CuVector{eltype(v)} for your case
end

Oct 16 '22 12:10 amontoison

If you use arnoldi on GPUs, the Krylov basis V is a CuMatrix but the Hessenberg matrix is always stored as a SparseMatrixCSC for information.

Oct 16 '22 12:10 amontoison

Thank you, it works now.

I was wondering to obtain the eigenvalues using the implicit restarted Arnoldi iteration. I'm currently able to find them by diagonalizing the Hessenberg matrix with the same dimension of the matrix itself, which is not useful for large matrices, but it is a good starting point to test the framework.

n = 25
vals = Array(1:n)
vecs = rand(n, n)

A = vecs \ diagm(vals) * vecs    # A has the eigenvalues 1, 2, ..., 25

b = rand(n)

krylovdim = 25
V, H = arnoldi(A, b, krylovdim)

vals2 = eigvals(Array(H[1:end-1,:]))

sum(abs.(vals2 - eigvals(A)).^2)

6.88969212857728e-12

However, for large matrices it is useful to calculate with lower krylov dimensions. In this case, if I calculate with krylovdim = 10 for example, the single Arnoldi iteration is not sufficient to obtain accurately the largest 10 eigenvalues.

Do you know how to implement the implicit restarted Arnoldi iterations with the Krylov.jl framework?

Oct 16 '22 16:10 albertomercurio

I was asking why whe need to set the H matrix as a sparse CSC matrix. I figure out that it should be better to have a dense matrix for the eigenvalues problem. Here you can find the implicit restarted Arnoldi method, however it is approximately 6 times slower than KrylovKit eigsolve function, but I think it can be significantly improved.

6 is exactly the number of threads of my laptop, and I think that the KrylovKit version is faster only because it uses the multithreading.

using LinearAlgebra
using SparseArrays
using Krylov
using Krylov: @knrm2, @kaxpy!, @kdot, FloatOrComplex, ktypeof, vector_to_matrix

mutable struct Arnoldi
    A
    k::Int
    V::AbstractArray
    H::AbstractArray
    resid::Vector
end

function Arnoldi(A, b::AbstractVector{FC}, k::Int) where FC <: FloatOrComplex
    m, n = size(A)
    S = ktypeof(convert.(ComplexF64, b))
    M = vector_to_matrix(S)

    V = M(undef, n, k+1)
    H = zeros(eltype(M), k+1, k)
    resid = ones(Float64, k)
    
    for i = 1:k
        vᵢ = view(V,:,i)
        vᵢ₊₁ = q = view(V,:,i+1)
        if i == 1
            β = @knrm2(n, b)
            vᵢ .= b ./ β
        end
        mul!(q, A, vᵢ)
        for j = 1:i
            vⱼ = view(V,:,j)
            H[j,i] = @kdot(n, vⱼ, q)
            @kaxpy!(n, -H[j,i], vⱼ, q)
        end
        H[i+1,i] = @knrm2(n, q)
        vᵢ₊₁ .= q ./ H[i+1,i]
    end
    return Arnoldi(A, k, V, H, resid)
end

function expand_arnoldi!(arn::Arnoldi, p::Int, k::Int)
    k < p && throw(DomainError(k-p, "k must to be bigger than p."))
    p <= 1 && throw(DomainError(p, "p must to be bigger than 1."))

    A = arn.A
    V = arn.V
    H = arn.H

    m, n = size(A)

    for i = p:k
        vᵢ = view(V,:,i)
        vᵢ₊₁ = q = view(V,:,i+1)
        mul!(q, A, vᵢ)
        for j = 1:i
            vⱼ = view(V,:,j)
            H[j,i] = @kdot(n, vⱼ, q)
            @kaxpy!(n, -H[j,i], vⱼ, q)
        end
        H[i+1,i] = @knrm2(n, q)
        vᵢ₊₁ .= q ./ H[i+1,i]
    end
    return arn
end

function eigsolve(A, v₀, p::Int = min(4, size(A, 1)), m::Int = min(10, size(A, 1)); tol::Real = 1e-10, maxiter::Int = 200)
    S = ktypeof(convert.(ComplexF64, v₀))
    M = vector_to_matrix(S)

    my_arnoldi = Arnoldi(A, v₀, m)
    V, H, resid = my_arnoldi.V, my_arnoldi.H, my_arnoldi.resid

    iter = 1
    while iter < maxiter && sum(my_arnoldi.resid[end-p+1:end] .< tol) < p
        Vₘ = view(V, :, 1:m)
        Hₘ = view(H, 1:m, :)
        qₘ = view(V, :, m+1)
        βeₘ = view(H, m+1, :)'
    
        # A * Vₘ ≈ Vₘ * M(Hₘ) + qₘ * M(βeₘ) == true
    
        μₖ, y = eigen(Hₘ)
        idxs = sortperm(abs.(μₖ))
        μₖ = μₖ[idxs][1:m-p]
        y = y[:, idxs]
        copyto!(resid, map(i->abs(βeₘ * y[:, i]), 1:m))
        for μᵢ in μₖ
            Q, R = qr(Hₘ - μᵢ * I)
            Hₘ = Q' * Hₘ * Q
            Vₘ *= M(Q)
            βeₘ *= Q
        end
    
        Vₚ = view(Vₘ, :, 1:p)
        Hₚ = view(Hₘ, 1:p, 1:p)
        βeₚ = reshape(view(βeₘ, 1, 1:p), 1, p)
    
        # A * Vₚ ≈ Vₚ * M(Hₚ) + qₘ * M(βeₚ) == true
    
        copyto!(V, 0 * V)
        copyto!(H, 0 * H)
        copyto!(view(V, :, 1:p), Vₚ)
        copyto!(view(V, :, p+1), qₘ)
        copyto!(view(H, 1:p, 1:p), Hₚ)
        copyto!(view(H, p+1, 1:p), βeₚ)
    
        expand_arnoldi!(my_arnoldi, p, m)
        iter+=1
    end

    n_converged = sum(my_arnoldi.resid[end-p+1:end] .< tol)
    Vₘ = view(V, :, 1:m)
    Hₘ = view(H, 1:m, :)
    μ, y = eigen(Hₘ)
    idxs = sortperm(abs.(μₖ))
    μ = μ[idxs][m-n_converged+1:m]
    y = M(y)[:, idxs]
    x = hcat(map(i->Vₘ * y[:, i], m-n_converged+1:m)...)

    return μ, x, (iter, my_arnoldi.resid)
end

n = 50
vals = Array(1:n)
vecs = rand(n, n)
A = vecs \ Array(diagm(vals)) * vecs

b = rand(n)

E, U, info = eigsolve(A, b, 4, 10)

Converged after 23 iterations.

It should still support CUDA arrays.

Oct 17 '22 03:10 albertomercurio

You need to replace

idxs = sortperm(abs.(μₖ))

by

idxs = sortperm(abs.(μ))

at the end of your function eigsolve.

For the implicit restarted Arnoldi iterations, what reference(s) do you recommend? Is the article of R. B. Lehoucq and D. C. Sorensen a good one?

Oct 17 '22 04:10 amontoison

I followed this chapter book, which is very clear.

This article is also good, they make a sligthly different version which should be faster.

Oct 17 '22 07:10 albertomercurio

Here they make an adaptive implicit restarted Arnoldi iteration. And they declare a speedup of 6 times.

Oct 17 '22 13:10 albertomercurio

Thanks @albertomercurio!

Oct 17 '22 23:10 amontoison

I think we can close this issue, since it is solved.

Oct 18 '22 12:10 albertomercurio

Krylov.jl Krylov.jl copied to clipboard

Add `ktypeof` support for view arrays

Krylov.jl
Krylov.jl copied to clipboard