CUDA.jl [Feature Request] Determinant Function

We currently don't have a way to take determinants of matrices on the GPU as far as I know. It'd be really nice if we could do that.

Nov 14 '19 02:11 MasonProtter

Came up here again: https://discourse.julialang.org/t/determinant-of-cuda-matrix/77245/3

I mentioned

julia> M = rand(10,10);

julia> Mgpu = cu(M);

julia> det(M) # cpu
-0.10243723110926993

julia> prod(diag(LinearAlgebra.LAPACK.getrf!(M)[1])) # cpu
-0.10243723110926993

julia> prod(diag(CUSOLVER.getrf!(Mgpu)[1])) # gpu
-0.10243723f0

But I think the sign is uncertain here!

Mar 01 '22 15:03 carstenbauer

Indeed, there is a sign issue. I get a negative determinant for a PSD matrix:

LinearAlgebra.det(m::CuMatrix) = prod(diag(CUSOLVER.getrf!(m)[1])) #Shameless type piracy
A = rand(10,10)
S = A*A'
S_d = cu(Σ)
det(S) == -det(S_d) #true

Since I only need it for PSD matrices, I can work it around by adding taking the abs. I can't help more though; I don't have a strong background in LA.

Mar 01 '22 17:03 HenriDeh

But I think the sign is uncertain here!

You have to multiply by the sign of the permutation. Something like:

function det!(A::CuMatOrAdj)
    A, ipiv = CUSOLVER.getrf!(A)
    return prod(diag(A)) * (isodd(count(((k,i),) -> k != i, enumerate(ipiv))) ? -1 : 1)
end
LinearAlgebra.det(A::CuMatOrAdj) = det!(copy(A))

PS. This is not "type piracy" because the CuMatOrAdj type is specific to this package.

Mar 01 '22 17:03 stevengj

Note that you'll also want to provide logdet and logabsdet, which can be computed with much the same code (you just take the sum of the logs of the |diagonals| instead of the product, and keep track of the signs).

Mar 01 '22 17:03 stevengj

Was going to suggest the same as Steven, i.e. multiplying by the sign of the permutation:

My draft implementation was

function det!(m::CuMatOrAdj)
   X_d, ipiv_d = CUSOLVER.getrf!(m)
   diags = Array(diag(X_d))
   ipiv = Array(ipiv_d)
   det = one(eltype(diags))
   @inbounds for i in 1:length(ipiv)
       det *= diags[i]
       if i != ipiv[i]
           det = -det
       end
   end
   return det
end
LinearAlgebra.det(A::CuMatOrAdj) = det!(copy(A))

I guess the only question is how to most efficiently implement this.

BTW, do we need a CUDA.@sync to be safe?

Mar 01 '22 17:03 carstenbauer

PS. This is not "type piracy" because the CuMatOrAdj type is specific to this package.

I guess he meant that this is type piracy in his code (i.e. outside of CUDA.jl).

Mar 01 '22 18:03 carstenbauer

PS. This is not "type piracy" because the CuMatOrAdj type is specific to this package.

I guess he meant that this is type piracy in his code (i.e. outside of CUDA.jl).

Yes, it's a copy-paste from my code, where it is piracy.

Mar 01 '22 19:03 HenriDeh

All of the implementations proposed here are much slower than a CPU equivalent I fear. I made this benchmark where I compute the logdet without needing the LU decomposition (I assume it was done before):

using CUDA,LinearAlgebra,BenchmarkTools

L = LowerTriangular(rand(32,32)) 
L_d = L |> cu

#Given L or U, compute logdet of L*U
function mylogdetLorU(LorU::Union{LowerTriangular{<:Number, <:CuArray}, UpperTriangular{<:Number, <:CuArray}})
    return 2*mapreduce(log, +, diag(LorU), init = 0f0) #fastest I could find
end

@btime mylogdetLorU($L_d) #85.6 μs
@btime 2*logdet($L) #186.18 ns

I find identical performance with the above implementations of det! (from which I strip off the getrf! since in my case it's already done).

I think they allocate too much (the CPU implementation does not allocate at all).

Mar 29 '22 09:03 HenriDeh

I have a logdet implementation inspired by @carstenbauer's code above, and it's faster than the CPU implementation for the case I care about:

function logdet!(m)
   X_d, ipiv_d = CUSOLVER.getrf!(m)
   ipiv = Array(ipiv_d)

   ldet = sum(log, diag(X_d))
   ldet += im * π * get_count(ipiv)
   
   return real(ldet) + im*(imag(ldet) % 2π)
end

function get_count(ipiv)
    return @inbounds count(i -> isequal(i, ipiv[i]), 1:length(ipiv))
end

my_logdet(A) = logdet!(copy(A))

Timing it:

julia> A = randn(ComplexF64, 1024, 1024);

julia> A_d = CuArray(A);

julia> @belapsed logdet($A)
0.03181127

julia> @belapsed my_logdet($A_d)
0.007738641

julia> isapprox(logdet(A), my_logdet(A_d); rtol = 1e-8)
true

get_count is non-allocating, but we could do it with a mapreduce or similar. You should be able to do the same thing for det and determine the sign similarly. My only concern about this is we still need to pull the ipiv array to the CPU, is there a nice built-in way to do it on GPU?

May 17 '22 21:05 SBuercklin

I guess the overhead of diag allocating on the GPU is insignificant when one gets the GPU speed up for the decomposition. My use case is probably too specific to deserve an investigation here.

May 18 '22 09:05 HenriDeh

CUDA.jl CUDA.jl copied to clipboard

[Feature Request] Determinant Function

CUDA.jl
CUDA.jl copied to clipboard