Commented the multithreaded function for support CUDA array
Hello,
I'm searching a fast implementation of a GPU version of the exponentiate function. If I run this code it does not work
using LinearAlgebra
using SparseArrays
using CUDA
using CUDA.CUSPARSE
CUDA.allowscalar(false)
N = 400
a = sprand(ComplexF64, N, N, 1 / N)
A = CuSparseMatrixCSR(a)
ω = rand(ComplexF64, N)
Ω = CuArray(ω)
exponentiate(A, 0.1, Ω)
As can be seen by the commit, I remove the multithreaded function to support the GPU array.
I think that the best option could be to uncomment again that part and insert and if condition, but I don't know what kind of condition I need to insert for both CUDA sparse and dense matrices.
A possible solution could be to start using the API provided by ArrayInterface.jl
How do you do that?
Another possible solution could be to using the AbstractGPUArray type of the GPUArrays.jl package. Indeed, we could write and if condition, for example
if !(y isa AbstractGPUArray) && y isa AbstractArray && IndexStyle(y) isa IndexLinear && get_num_threads() > 1
return unproject_linear_multithreaded!(y, b, x, α, β, r)
end
However it will work only for dense CuArrays, because for the moment the CuSparseMatrix arrays are not a subclass of this type. I will open an issue for that.
Ok I should fixed it by introducing the AbstractGPUArray type from GPUArrays.jl
However, the first time I run the solver with the GPU it returns the out-of-memory error, linking to these lines
https://github.com/Jutho/KrylovKit.jl/blob/dee29fe34c724e4c9532ea22dfb6e0464263b8c1/src/orthonormal.jl#L313-L323
But then I am able to run it everytime. So it seems related to the multithreading I think.
So it is only for the first time, then it runs perfectly.
I have not had any time for coding / package maintenance the last few weeks. I will try to find some time over the next couple of days.
Hi @albertomercurio , my apologies for the late response. I think your solution using AbstractGPUArray is currently the best one. However, I removed Manifest.toml from the PR. (I actually did not realise that I could immediately modify your fork of KrylovKit; I thought that would show up as a PR to your fork).
Codecov Report
Merging #61 (a8274d2) into master (dee29fe) will increase coverage by
0.12%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #61 +/- ##
==========================================
+ Coverage 82.45% 82.58% +0.12%
==========================================
Files 27 27
Lines 2753 2750 -3
==========================================
+ Hits 2270 2271 +1
+ Misses 483 479 -4
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/KrylovKit.jl | 71.92% <ø> (-0.49%) |
:arrow_down: |
| src/orthonormal.jl | 87.08% <100.00%> (+1.10%) |
:arrow_up: |
| src/dense/givens.jl | 47.36% <0.00%> (-2.64%) |
:arrow_down: |
| src/adrules/linsolve.jl | 52.85% <0.00%> (+0.74%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Ok I should fixed it by introducing the AbstractGPUArray type from GPUArrays.jl
However, the first time I run the solver with the GPU it returns the out-of-memory error, linking to these lines
https://github.com/Jutho/KrylovKit.jl/blob/dee29fe34c724e4c9532ea22dfb6e0464263b8c1/src/orthonormal.jl#L313-L323
But then I am able to run it everytime. So it seems related to the multithreading I think.
So it is only for the first time, then it runs perfectly.
Do you have a minimal working example for this? Are you working with a krylov dimension which is such that you are almost maxing out the GPU memory. The problem with this method is that it is indeed temporarily duplicating the number krylov vectors that you need to store.
Do you have a minimal working example for this?
It seems that I have no more this memory saturation issue. However I used a krylov dimension 40 which should
using LinearAlgebra
using SparseArrays
using IncompleteLU
using LinearMaps
using Krylov # The master branch from GitHub
using KrylovKit
using CUDA
using CUDA.CUSPARSE
CUDA.allowscalar(false)
struct ILU0gpu{AbstractCuSparseMatrix}
ILU0::AbstractCuSparseMatrix
end
import LinearAlgebra
function LinearAlgebra.ldiv!(y::CuArray{T}, P::ILU0gpu{CuSparseMatrixCSR{T, M}}, x::CuArray{T}) where {T,M}
A = P.ILU0
copyto!(y, x)
sv2!('N', 'L', 'U', 1.0, A, y, 'O')
sv2!('N', 'U', 'N', 1.0, A, y, 'O')
return y
end
function LinearAlgebra.ldiv!(y::CuArray{T}, P::ILU0gpu{CuSparseMatrixCSC{T, M}}, x::CuArray{T}) where {T,M}
A = P.ILU0
copyto!(y, x)
sv2!('N', 'L', 'N', 1.0, A, y, 'O')
sv2!('N', 'U', 'U', 1.0, A, y, 'O')
return y
end
N = 1000
elty = ComplexF64
vals0 = Array(1:N)
vecs0 = rand(elty, N, N)
A = vecs0 \ diagm(vals0) * vecs0
A = sparse(A)
# droptol!(A, 1000)
v0 = rand(elty, N)
μ = 0 # SHIFT-INVERSE
A_s = A - μ * I
A_s2 = CuSparseMatrixCSR(A_s)
v02 = CuArray(v0)
P2 = ILU0gpu(ilu02(A_s2, 'O'))
solver = GmresSolver(size(A_s2)..., 20, typeof(v02))
Lmap = LinearMap{eltype(v02)}((y, x) -> copyto!(y, gmres!(solver, A_s2, x, M = P2, ldiv = true, restart=true).x), size(A_s2, 1))
vals, vecs, info = eigsolve(Lmap, v02, 6, krylovdim=20)
show(info)
vals = (1 .+ μ * vals) ./ vals
abs.(vals)