DistributionsAD.jl icon indicating copy to clipboard operation
DistributionsAD.jl copied to clipboard

Need for MvNormal with sparse precision matrix

Open jrfaulkner opened this issue 4 years ago • 18 comments

After much digging and testing, it appears you are currently not supporting sparse matrix constructions for AD. Is this correct? In particular, I am in need of a MvNormal distribution that is specified using a sparse precision matrix and will work with the autodiff in Turing. This is common for large spatial problems using Gaussian Markov random fields. I came to Turing for this because stan does not support sparse matrix operations. This would be a really valuable addition to your package if you can make it work.

jrfaulkner avatar Jun 24 '20 19:06 jrfaulkner

No, MvNormalCanon (which you are referring to) is not supported yet. Would be a valuable addition.

devmotion avatar Jun 26 '20 10:06 devmotion

One that allows sparse precision matrix is the key.

jrfaulkner avatar Jun 26 '20 14:06 jrfaulkner

Sure, I understand, I was just referring to the fact that MvNormalCanon is the distribution type in Distributions for multivariate normal distributions that are parameterized by precision matrices.

devmotion avatar Jun 26 '20 14:06 devmotion

Yes, thanks for pointing that out. Is there any reason that sparse operations would cause problems for autodiff?

jrfaulkner avatar Jun 26 '20 15:06 jrfaulkner

This is also something I would find really valuable. From trying out a toy model, am I correct that Distributions is using PDMats which, when it constructs a PDSparseMat, calls a cholesky routine from CHOLMOD, which chokes on the Dual numbers?

using Turing
using SparseArrays
using PDMats

@model function MvNormalCanonTest(x)
    n = length(x)
    ρ ~ Uniform(0, 0.49)
    Q = spdiagm(-1 => ρ * ones(n-1), 0 => ones(n), 1 => ρ * ones(n-1))
    x ~ MvNormalCanon(PDSparseMat(Q))
end

x = randn(10)
m = MvNormalCanonTest(x)
sample(m, NUTS(), 100)

TypeError: in Sparse, in Tv, expected Tv<:Union{Complex{Float64}, Float64}, got Type{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}},Float64},Float64,1}}
SuiteSparse.CHOLMOD.Sparse(::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}},Float64},Float64,1},Int64}, ::Int64) at cholmod.jl:933
SuiteSparse.CHOLMOD.Sparse(::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}},Float64},Float64,1},Int64}) at cholmod.jl:939
cholesky(::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}},Float64},Float64,1},Int64}; kws::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at cholmod.jl:1458
cholesky at cholmod.jl:1458 [inlined]
PDSparseMat(::SparseMatrixCSC{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}},Float64},Float64,1},Int64}) at pdsparsemat.jl:22
macro expansion at untitled-d3c70e9b18556a75b02f77b5bb73e87c:245 [inlined]
##evaluator#299(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"###evaluator#299",(:x,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#300",(:x,),(),Tuple{}}}, ::DynamicPPL.ThreadSafeVarInfo{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f...

ElOceanografo avatar Aug 22 '20 00:08 ElOceanografo

This PR is probably relevant: JuliaDiff/ChainRules.jl#246. If I understand the AD machinery being used here (I don't), once sparse array support is added, rules for sparse Choleskys can be added that will enable differentiating through. Unfortunately CHOLMOD doesn't know what Dual numbers are, so ForwardDiff AD fails.

jkbest2 avatar Sep 16 '20 22:09 jkbest2

Reading the stacktrace for my error above more carefully, I the error is actually coming from the Sparse constructor, not the Cholesky decomposition, so hopefully https://github.com/JuliaDiff/ChainRules.jl/pull/246 will solve the problem.

ElOceanografo avatar Oct 13 '20 20:10 ElOceanografo

I've done more investigating and think that it might be easier to implement a custom TuringMvNormalCanon type in this package, at least as long as Distributions uses PDMats. The issue is that even with a custom adjoint defined for the Cholesky decomposition, the rrule still has to calculate the factorization--which, as long as it is done via SuiteSparse.CHOLMOD, means it won't be able to handle dual numbers.

However, I recently found LDLFactorizations.jl, a pure-Julia port of the LDL decomposition from SuiteSparse--which can be used to calculate the Cholesky factors, even for non-float matrices. Using this approach, I've hacked together a custom MvNormalCanon implementation that works in my example Turing model above. @devmotion , does this sound like a worthwhile PR?

ElOceanografo avatar Nov 13 '20 21:11 ElOceanografo

The issue is that even with a custom adjoint defined for the Cholesky decomposition, the rrule still has to calculate the factorization--which, as long as it is done via SuiteSparse.CHOLMOD, means it won't be able to handle dual numbers.

I don't understand why there is a problem with the custom adjoint here - what is missing or broken in the definition of the adjoint in ChainRules (https://github.com/JuliaDiff/ChainRules.jl/blob/f63f385e7a8477c5112f0b2cf24790ca1abf6320/src/rulesets/LinearAlgebra/factorization.jl#L73)?

However, I recently found LDLFactorizations.jl, a pure-Julia port of the LDL decomposition from SuiteSparse--which can be used to calculate the Cholesky factors, even for non-float matrices.

Interesting, I didn't know this package. I wonder if it is problematic that it uses the LGPL license.

devmotion avatar Nov 13 '20 22:11 devmotion

In my example above, the precision matrix Q is defined in terms of the unknown parameter ρ, so when applying any autodiff routine to the model, Q ends up being a SparseMatrixCSC with elements of some dual number type. The error is coming from the PDSparseMat constructor trying to precalculate cholesky(Q) using CHOLMOD, not the actual calculation of the likelihood or its derivative.

The rrule seems to be defined correctly, but it still calls CHOLMOD if the matrix X is sparse (line 74 in the file you linked). My understanding of how rrules work is pretty superficial, so I may be wrong, but I think it would run into the same problem there, right?

Regarding the LGPL...I'm not a lawyer, but I believe it's okay to use LGPL libraries in non-GPL code, as long as the non-GPL code isn't a derivative work. SuiteSparse itself is under the LGPL, so if using LDLFactorizations causes license issues, they're at least not new ones...

ElOceanografo avatar Nov 13 '20 23:11 ElOceanografo

Ah OK, I think I understand now. I'm a bit surprised that ForwardDiff doesn't support cholesky of sparse matrices but I've never checked.

Did you try to use Zygote instead of ForwardDiff? It uses the adjoints in ChainRules and does not operate with dual numbers.

devmotion avatar Nov 14 '20 07:11 devmotion

Zygote doesn't work either, it fails with a Mutating arrays is not supported error. Some of the linear algebra functions build up their results in-place.

Traceback ```julia Mutating arrays is not supported error(::String) at error.jl:33 (::Zygote.var"#364#365")(::Nothing) at array.jl:58 (::Zygote.var"#2245#back#366"{Zygote.var"#364#365"})(::Nothing) at adjoint.jl:49 diag at cholmod.jl:1792 [inlined] (::typeof(∂(diag)))(::Array{Float64,1}) at interface2.jl:0 logdet at cholmod.jl:1801 [inlined] (::typeof(∂(logdet)))(::Float64) at interface2.jl:0 logdet at pdsparsemat.jl:55 [inlined] (::typeof(∂(logdet)))(::Float64) at interface2.jl:0 logdetcov at mvnormalcanon.jl:165 [inlined] (::typeof(∂(logdetcov)))(::Float64) at interface2.jl:0 mvnormal_c0 at mvnormal.jl:99 [inlined] (::typeof(∂(mvnormal_c0)))(::Int64) at interface2.jl:0 _logpdf at mvnormal.jl:127 [inlined] (::typeof(∂(_logpdf)))(::Int64) at interface2.jl:0 #119 at multivariates.jl:262 [inlined] (::typeof(∂(λ)))(::Int64) at interface2.jl:0 #1071 at broadcast.jl:140 [inlined] #3 at generator.jl:36 [inlined] iterate at generator.jl:47 [inlined] collect(::Base.Generator{Base.Iterators.Zip{Tuple{Array{typeof(∂(λ)),1},FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}}}},Base.var"#3#4"{Zygote.var"#1071#1078"}}) at array.jl:686 map at abstractarray.jl:2248 [inlined] (::Zygote.var"#1070#1077"{Tuple{UnitRange{Int64}},Val{2},Array{typeof(∂(λ)),1}})(::FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}}) at broadcast.jl:140 #3862#back at adjoint.jl:49 [inlined] (::Zygote.var"#150#151"{Zygote.var"#3862#back#1081"{Zygote.var"#1070#1077"{Tuple{UnitRange{Int64}},Val{2},Array{typeof(∂(λ)),1}}},Tuple{Tuple{Nothing,Nothing,Nothing},Tuple{}}})(::FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}}) at lib.jl:191 #1693#back at adjoint.jl:49 [inlined] broadcasted at broadcast.jl:1257 [inlined] (::typeof(∂(broadcasted)))(::FillArrays.Fill{Int64,1,Tuple{Base.OneTo{Int64}}}) at interface2.jl:0 #556 at array.jl:243 [inlined] (::typeof(∂(#556)))(::Int64) at interface2.jl:0 #41 at interface.jl:45 [inlined] #557 at array.jl:244 [inlined] sum at reducedim.jl:720 [inlined] (::typeof(∂(sum)))(::Int64) at interface2.jl:0 loglikelihood at multivariates.jl:262 [inlined] (::typeof(∂(loglikelihood)))(::Int64) at interface2.jl:0 observe at context_implementations.jl:152 [inlined] observe at hmc.jl:557 [inlined] _tilde at context_implementations.jl:109 [inlined] tilde at context_implementations.jl:67 [inlined] (::typeof(∂(tilde)))(::Int64) at interface2.jl:0 tilde_observe at context_implementations.jl:89 [inlined] (::typeof(∂(tilde_observe)))(::Nothing) at interface2.jl:0 #7 at untitled-80c47956c154a0947dfbe4a5c1a877ba:92 [inlined] (::typeof(∂(#7)))(::Nothing) at interface2.jl:0 macro expansion at model.jl:0 [inlined] _evaluate at model.jl:160 [inlined] (::typeof(∂(_evaluate)))(::Nothing) at interface2.jl:0 evaluate_threadsafe at model.jl:150 [inlined] (::typeof(∂(evaluate_threadsafe)))(::Nothing) at interface2.jl:0 Model at model.jl:94 [inlined] (::typeof(∂(λ)))(::Nothing) at interface2.jl:0 #150 at lib.jl:191 [inlined] #1693#back at adjoint.jl:49 [inlined] Model at model.jl:98 [inlined] f at ad.jl:165 [inlined] (::typeof(∂(λ)))(::Int64) at interface2.jl:0 (::Zygote.var"#41#42"{typeof(∂(λ))})(::Int64) at interface.jl:45 gradient_logp(::Turing.Core.ZygoteAD, ::Array{Float64,1}, ::DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::DynamicPPL.Model{var"#7#8",(:x,),(),(),Tuple{Array{Float64,1}},Tuple{}}, ::DynamicPPL.Sampler{NUTS{Turing.Core.ZygoteAD,(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}}, ::DynamicPPL.DefaultContext) at ad.jl:171 gradient_logp at ad.jl:83 [inlined] gradient_logp at ad.jl:83 [inlined] ∂logπ∂θ at hmc.jl:474 [inlined] ∂H∂θ at hamiltonian.jl:31 [inlined] phasepoint at hamiltonian.jl:69 [inlined] find_good_stepsize(::Random._GLOBAL_RNG, ::AdvancedHMC.Hamiltonian{AdvancedHMC.DiagEuclideanMetric{Float64,Array{Float64,1}},Turing.Inference.var"#logπ#55"{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Sampler{NUTS{Turing.Core.ZygoteAD,(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:ρ,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:ρ,Tuple{}},Int64},Array{Uniform{Float64},1},Array{DynamicPPL.VarName{:ρ,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.Model{var"#7#8",(:x,),(),(),Tuple{Array{Float64,1}},Tuple{}}},Turing.Inference.var"#∂logπ∂θ#54"{DynamicPPL.VarI... ```

ElOceanografo avatar Nov 16 '20 17:11 ElOceanografo

Sorry, it's not clear to me what exactly you tested here. Is it the Turing model above? That might still require some additional work but I was just referring to the statements about AD of cholesky above - I assumed that this should work with sparse matrices since Zygote should use the custom adjoint defined in ChainRules.

devmotion avatar Nov 16 '20 19:11 devmotion

Yes I just ran the Turing model above after setting the AD backend to Zygote...sorry for not being clearer. However, I also get the same error with a hand-coded likelihood function :

using Distributions, Zygote, SparseArrays, PDMats

const xobs = randn(10)

function loglik(θ)
    ρ = 0.5tanh(θ[1])
    n = length(xobs)
    Q = spdiagm(-1 => ρ * ones(n-1), 0 => ones(n), 1 => ρ * ones(n-1))
    d = MvNormalCanon(PDSparseMat(Q))
    return logpdf(d, xobs)
end

Zygote.gradient(loglik, [1.0])
# ERROR: Mutating arrays is not supported

ElOceanografo avatar Nov 16 '20 21:11 ElOceanografo

For what it's worth, the pullback and pushforward of cholesky with a sparse argument appears to work:

julia> n = 100
julia> ρ = 0.5
julia> Q = spdiagm(-1 => ρ * ones(n-1), 0 => ones(n), 1 => ρ * ones(n-1))

julia> pullback(cholesky, Q)
(SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  19
nnz:     19
success: true
, Zygote.var"#41#42"{Zygote.ZBack{ChainRules.var"#cholesky_pullback#1615"{SuiteSparse.CHOLMOD.Factor{Float64}}}}(Zygote.ZBack{ChainRules.var"#cholesky_pullback#1615"{SuiteSparse.CHOLMOD.Factor{Float64}}}(ChainRules.var"#cholesky_pullback#1615"{SuiteSparse.CHOLMOD.Factor{Float64}}(SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  19
nnz:     19
success: true
))))

julia> pushforward(cholesky, Q)
#8 (generic function with 1 method)

But trying to get the gradient of logdet with respect to a sparse matrix fails:

julia> Zygote.gradient(logdet, Q)
ERROR: MethodError: no method matching logabsdet(::SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64})
Closest candidates are:
  logabsdet(::SymTridiagonal; shift) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/tridiag.jl:441
  logabsdet(::UnitUpperTriangular{T,S} where S<:AbstractArray{T,2}) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/triangular.jl:2652
  logabsdet(::UnitLowerTriangular{T,S} where S<:AbstractArray{T,2}) where T at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/triangular.jl:2653
  ...
Stacktrace:
 [1] logabsdet(::SparseMatrixCSC{Float64,Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:1573
 [2] adjoint at /home/jkbest/.julia/packages/Zygote/c0awc/src/lib/array.jl:378 [inlined]
 [3] _pullback at /home/jkbest/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [4] logdet at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/generic.jl:1596 [inlined]
 [5] _pullback(::Zygote.Context, ::typeof(logdet), ::SparseMatrixCSC{Float64,Int64}) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface2.jl:0
 [6] _pullback(::Function, ::SparseMatrixCSC{Float64,Int64}) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:38
 [7] pullback(::Function, ::SparseMatrixCSC{Float64,Int64}) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:44
 [8] gradient(::Function, ::SparseMatrixCSC{Float64,Int64}) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:53
 [9] top-level scope at REPL[22]:1

And if you use a function to create a AR-1 precision matrix and try to take logdet with respect to ρ, you get the same error about mutating arrays.

julia> spchol(ρ; n = 100) =  cholesky(spdiagm(-1 => ρ * ones(n-1), 0 => ones(n), 1 => ρ * ones(n-1)))
julia> Zygote.gradient(ρ -> logdet(spchol(ρ)), 0.5)
ERROR: Mutating arrays is not supported
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] (::Zygote.var"#364#365")(::Nothing) at /home/jkbest/.julia/packages/Zygote/c0awc/src/lib/array.jl:58
 [3] (::Zygote.var"#2245#back#366"{Zygote.var"#364#365"})(::Nothing) at /home/jkbest/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:49
 [4] diag at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/SuiteSparse/src/cholmod.jl:1792 [inlined]
 [5] (::typeof(∂(diag)))(::Array{Float64,1}) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface2.jl:0
 [6] logdet at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/SuiteSparse/src/cholmod.jl:1801 [inlined]
 [7] (::typeof(∂(logdet)))(::Float64) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface2.jl:0
 [8] #4 at ./REPL[20]:1 [inlined]
 [9] (::typeof(∂(#4)))(::Float64) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface2.jl:0
 [10] (::Zygote.var"#41#42"{typeof(∂(#4))})(::Float64) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:45
 [11] gradient(::Function, ::Float64) at /home/jkbest/.julia/packages/Zygote/c0awc/src/compiler/interface.jl:54
 [12] top-level scope at REPL[20]:1

I'm not sure if any of this is helpful, but maybe a Zygote issue would be more productive?

jkbest2 avatar Nov 16 '20 22:11 jkbest2

For what it's worth, the pullback and pushforward of cholesky with a sparse argument appears to work:

That's exactly what I assumed and why I thought that for Zygote the problem mentioned above for ForwardDiff

The error is coming from the PDSparseMat constructor trying to precalculate cholesky(Q) using CHOLMOD, not the actual calculation of the likelihood or its derivative.

should not exist.

But trying to get the gradient of logdet with respect to a sparse matrix fails:

It seems logabsdet is not defined for sparse matrices (and its LU decomposition)?!

devmotion avatar Nov 16 '20 22:11 devmotion

It seems logabsdet is not defined for sparse matrices (and its LU decomposition)?!

Weird that we'd be the first people to notice this, but that seems to be the case. There are a few closed issues for other missing logabsdet methods, but not for SparseMatrixCSC. I've opened an issue about it: https://github.com/JuliaLang/julia/issues/38463

Anyway, in the meantime...what do you think is the best way to move forward on this issue?

ElOceanografo avatar Nov 17 '20 02:11 ElOceanografo

Some progress on this: logabsdet will now work for sparse matrices (on Julia master, should be released in v1.7). Once the AD issues get sorted out, this issue can probably be closed.

ElOceanografo avatar May 14 '21 17:05 ElOceanografo