AlgebraicMultigrid.jl
AlgebraicMultigrid.jl copied to clipboard
Support for CuSparse
using CUDA, SparseArrays, LinearAlgebra, AlgebraicMultigrid
CUDA.allowscalar(false)
W = CUDA.CUSPARSE.CuSparseMatrixCSR(sprand(100,100,0.1))
ruge_stuben(W)
#=
MethodError: no method matching ruge_stuben(::CUDA.CUSPARSE.CuSparseMatrixCSR{Float64, Int32})
Closest candidates are:
ruge_stuben(!Matched::Union{Hermitian{Ti, TA}, Symmetric{Ti, TA}, TA}) where {Ti, Tv, TA<:SparseMatrixCSC{Ti, Tv}} at C:\Users\accou\.julia\packages\AlgebraicMultigrid\ASpK7\src\classical.jl:10
ruge_stuben(!Matched::Union{Hermitian{Ti, TA}, Symmetric{Ti, TA}, TA}, !Matched::Type{Val{bs}}; strength, CF, presmoother, postsmoother, max_levels, max_coarse, coarse_solver, kwargs...) where {Ti, Tv, bs, TA<:SparseMatrixCSC{Ti, Tv}} at C:\Users\accou\.julia\packages\AlgebraicMultigrid\ASpK7\src\classical.jl:10
top-level scope at test.jl:122
eval at boot.jl:373 [inlined]
=#
W = cu(sprand(100,100,0.1))
ruge_stuben(W)
#=
MethodError: no method matching ruge_stuben(::CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32})
Closest candidates are:
ruge_stuben(!Matched::Union{Hermitian{Ti, TA}, Symmetric{Ti, TA}, TA}) where {Ti, Tv, TA<:SparseMatrixCSC{Ti, Tv}} at C:\Users\accou\.julia\packages\AlgebraicMultigrid\ASpK7\src\classical.jl:10
ruge_stuben(!Matched::Union{Hermitian{Ti, TA}, Symmetric{Ti, TA}, TA}, !Matched::Type{Val{bs}}; strength, CF, presmoother, postsmoother, max_levels, max_coarse, coarse_solver, kwargs...) where {Ti, Tv, bs, TA<:SparseMatrixCSC{Ti, Tv}} at C:\Users\accou\.julia\packages\AlgebraicMultigrid\ASpK7\src\classical.jl:10
top-level scope at test.jl:122
eval at boot.jl:373 [inlined]
=#
The AMG solve phase (just a few SpMVs) is much easier to port to GPU than the AMG setup phase (contains coarse node selection algorithms). This strategy is adopted by AMGCL.
I might try to add GPU functionality to AlgebraicMultigrid.jl. @learning-chip, could you elaborate a bit more on what you are suggestion? If I understand correctly you are saying that we only need to make sure that solve methods work on CUDAs. All the other setup phases can be done on CPU and converted to the GPU relatively easy?
Just chiming in to see if anyone has made any progress on this one. Would greatly appreciate it, thanks!