CUDA.jl Added more sparse functions like: kron, tril, triu, reshape, adjoint, transpose, sparse-sparse multiplication

I added more functions to support the sparse CUDA arrays.

What I added:

kron function, which returns the kronecker product of two sparse matrices.
tril and triu which return the lower-tringular and upper-triangular of a sparse matrix. I is possible also to insert an offset k, as usual.
sparse-sparse multiplication, which was missing before, and I think is the most important feature I added.
adjoint and transpose functions.
reshape of sparse matrices.
opnorm now supports also the p=1 case.
norm function with all cases (p=0,1,2,..,Inf,-Inf)
SparseArrays.droptol! which drops all the elements smaller than a certain quantity
exp function for matrix exponential. Often very useful for quantum mechanics calculations. It does not use the Pade approximation for several reasons. However this method seems to be very stable and fast.
ishermitian and issymmetric support. Before it was returning always false for a generic CuSparseMatrix.

About dense matrices, I also added the support for the LinearAlgebra eigen function.

Sep 24 '22 00:09 albertomercurio

I also resolved a problem related to the copyto!function. Indeed, before the following code was failing

using CUDA
using CUDA.CUSPARSE
using LinearAlgebra
using SparseArrays

a = sprand(ComplexF32, 100, 100, 0.1)
A = CuSparseMatrixCSR(a)

dst = A
src = A + 0.5 * I
copyto!(dst, src)

The reason was that the copyto! function copies the rowPtr CuArrays, which could be of different lengths for the dst and src arrays. Thus, before copying then, I do a resize! of all the quantities of the dst array.

Sep 24 '22 22:09 albertomercurio

Nice, looking good! I left some comments, and I'll try to do a better review later.

Sep 27 '22 15:09 maleadt

Maybe @amontoison could review this too, since he's working on the CUSPARSE code as well.

Oct 04 '22 07:10 maleadt

CI seems happy 🙂 Thanks for the thorough review @amontoison, and @albertomercurio for keeping up with it! I'll defer for the final approval to @amontoison.

Oct 19 '22 12:10 maleadt

Reposting a reply from @albertomercurio (from a resolved comment):

CUSPARSE has a routine for droptol! (https://docs.nvidia.com/cuda/cusparse/index.html#pruneCsr2csr)

Cool! How to implement it? I saw that there is no julia function.

You'd need to add a wrapper to e.g. conversions.jl.

Oct 20 '22 08:10 maleadt

I just stumbled upon this PR: it looks great! It implements a feature I was talking about in issue #1572, which is a support for LinearAlgebra's eigen function. Just a nitpick though: when calling eigen, the result is a structure called Eigen, and you can acces the vectors through the vectors field, and the values through the values field. Would it be possible to have a similar structure (for example using a NamedTuple)? For example, eigen for Symmetric real matrices would become:

function LinearAlgebra.eigen(A::Symmetric{T,<:CuMatrix}) where {T<:BlasReal}
    A2 = copy(A)
    vals, vects = syevd!('V', 'U', A2)
    (vectors = vects, values = vals)
end

This would give us a nice API, which would be common for GPU and CPU. We could directly call

F = eigen(A)
F.vectors
F.values

whether A is a CPU array or a GPU array.

Oct 20 '22 11:10 GVigne

Why not re-use the Eigen struct? It's parametric: https://github.com/JuliaLang/julia/blob/87fd6d9d0c804e39fbc221d50dad77ec927fa3db/stdlib/LinearAlgebra/src/eigen.jl#L50-L57 (in 1.8, after https://github.com/JuliaLang/julia/pull/42594)

Oct 20 '22 12:10 maleadt

You're right, that would be even better!

Oct 20 '22 12:10 GVigne

Hello @GVigne , thanks for giving this good idea. I should have implemented it by now.

Oct 20 '22 21:10 albertomercurio

It should be my last wave of comments!

Oct 21 '22 17:10 amontoison

CUSPARSE has a routine for droptol! (https://docs.nvidia.com/cuda/cusparse/index.html#pruneCsr2csr)

Cool! How to implement it? I saw that there is no julia function.

You'd need to add a wrapper to e.g. conversions.jl.

Hi @maleadt , I saw that only cusparseSpruneCsr2csr and cusparseDpruneCsr2csr are defined. These two functions should correspond only to real matrices, correct? I saw in the cuSPARSE documentation that there is also the function cusparseHpruneCsr2csr, is this the function for complex matrices? Because, as far as I know, the C and Z letters correspond to complex type.

Am I wrong?

Oct 22 '22 11:10 albertomercurio

cusparseHpruneCsr2csr is for half precision but I don't know why it's not in libcusparse.jl.

Oct 22 '22 17:10 amontoison

cusparseHpruneCsr2csr is for half precision but I don't know why it's not in libcusparse.jl.

The headers probably haven't been updated very recently. That also means that this call isn't going to be available for all versions of the CUDA toolkit, of course.

I can update the headers for 11.8; I'll have a look on Monday.

Oct 22 '22 19:10 maleadt

Hi @amontoison ,

I'm trying to resolve the conflicts possibly due to the merging of the PR #1648 . I noticed that you declared only types <:BlasFloat, and not BlasComplex. Why not putting just where {T}?

Are complex type covered by this change?

https://github.com/JuliaGPU/CUDA.jl/blob/4f9a80696d12ff50622a1ed060e7577baa1f5c82/lib/cusparse/interfaces.jl#L113-L123

Oct 25 '22 16:10 albertomercurio

BlasFloat is Union{BlasReal, BlasComplex}. :) I don't use where {T} because other types are not supporter by the geam routine.

Oct 25 '22 16:10 amontoison

BlasFloat is Union{BlasReal, BlasComplex}. :) I don't use where {T} because other types are not supporter by the geam routine.

Ok great!

Regarding the native droptol! function, do you think it is better to wait the support also for complex types? Because for the moment cusparseXpruneCsr2csr only supports real matrices, and I think it is not good to have different methods for different types.

What do you think?

Oct 25 '22 16:10 albertomercurio

Also, for future reference, it's better to rebase on top of instead of merging master.

Oct 25 '22 16:10 maleadt

BlasFloat is Union{BlasReal, BlasComplex}. :) I don't use where {T} because other types are not supporter by the geam routine.

Ok great!

Regarding the native droptol! function, do you think it is better to wait the support also for complex types? Because for the moment cusparseXpruneCsr2csr only supports real matrices, and I think it is not good to have different methods for different types.

What do you think?

Let's use your generic droptol! for now. We will change the dispatch if they support complex types one day.

Oct 25 '22 16:10 amontoison

Also, for future reference, it's better to rebase on top of instead of merging master.

Ok, do you mean to commit the changes by myself instead of modifying directly on GitHub? Sorry, it is the first time I resolve conflicts.

Anyway, I noticed that after this merging there are some problems with the CuSparseMatrixCSC types. For example

m = 25
n = 35
x = sprand(m,n,0.2)
d_x = CuSparseMatrixCSC(x)
CUDA.@allowscalar Array(d_x[:]) == x[:] # false

works only for CSR types, and it fails with CSC or COO matrices. Also Several other functions like triu or exp works only for CSR matrices.

I found that types conversions don't work

a = sprand(ComplexF32, 100, 100, 0.02)
A1 = CuSparseMatrixCSC(a)
A2 = CuSparseMatrixCSR(a)
Array(CuSparseMatrixCOO(A1)) ≈ Array(CuSparseMatrixCOO(A2))

Do you think it is my problem?

EDIT

I tried the official master branch and, while the first example code works, the last still does not work.

Oct 25 '22 17:10 albertomercurio

Also, for future reference, it's better to rebase on top of instead of merging master.

Ok, do you mean to commit the changes by myself instead of modifying directly on GitHub? Sorry, it is the first time I resolve conflicts.

I meant that instead of git merge master you do git rebase master. It doesn't matter here though, we'll squash this PR anyway.

Oct 25 '22 19:10 maleadt

m = 25
n = 35
x = sprand(m,n,0.2)
d_x = CuSparseMatrixCSC(x)
CUDA.@allowscalar Array(d_x[:]) == x[:] # false

I went throught the problem by using the master branch and adding one by one my lines. I noticed that the problem related to this code regards my reshape definition

Base.reshape(A::$SparseMatrixType{T,M}, dims::NTuple{N,Int}) where {T,N,M} = 
       $SparseMatrixType( reshape(CuSparseMatrixCOO(A), dims) )

And I think it is related to the fact that the conversion to COO matrix fails also in the master branch.

However, I don' know why Array(d_x[:]) should call my reshape function.

Oct 25 '22 19:10 albertomercurio

cusparseHpruneCsr2csr will be added by https://github.com/JuliaGPU/CUDA.jl/pull/1649

Oct 26 '22 07:10 maleadt

Ok, but I think that the problem is the conversion from/to COO matrix.

In my functions I often call this conversion and this is why they fails.

Do you see also this problem in your version @amontoison and @maleadt ?

Oct 26 '22 09:10 albertomercurio

Sorry, my comments was unrelated to the issue you're having, but just in response of what you said earlier.

Oct 26 '22 09:10 maleadt

Ok, but I think that the problem is the conversion from/to COO matrix.

In my functions I often call this conversion and this is why they fails.

Do you see also this problem in your version @amontoison and @maleadt ?

Yes @albertomercurio, I opened a PR to fix the issue (#1655). The conversions between sparse matrices were never tested :grimacing:

Oct 27 '22 03:10 amontoison

Ok, I should be done now.

Nov 08 '22 15:11 albertomercurio

CI failure is related.

Nov 08 '22 15:11 maleadt

Still a couple of relevant issues. Are you not seeing those locally?

Nov 09 '22 06:11 maleadt

I was convinced that the changes made were good. So I have not tested it on my local setup.

Now it should be ok.

Nov 09 '22 10:11 albertomercurio

LGTM! Let's finally merge this 🙂 Thanks for the PR!

For future PRs, maybe keep the scope confined a little more so that its easier to review & merge (this one introduces quite some features).

Nov 10 '22 07:11 maleadt