Bijectors.jl icon indicating copy to clipboard operation
Bijectors.jl copied to clipboard

LKJ bijector is having problems

Open cpfiffer opened this issue 4 years ago • 8 comments

I wish I could be more specific about the problems in the issue description, but I'm having a hard time codifying them. I've gotten singular exceptions, non-PD exceptions, and non-Hermitian matrix exceptions, neither of which seem to have an obvious cause to me.

MWE:

using Turing, LinearAlgebra

@model function ex(data)
    # Set up covariance matrix
    sigma ~ filldist(InverseGamma(2, 3), size(data, 2))
    omega ~ LKJ(size(data, 2), 1.0)
    Σ = Symmetric(diagm(sigma) * omega * diagm(sigma))

    # Observations
    for t in 1:size(data, 1)
        data[t,:] ~ MvNormal(zeros(size(data, 2)), Σ)
    end
end

# Parameters
dim = 10
n_obs = 1000

# Variances
sigma = rand(InverseGamma(2, 3), dim)

# Correlation matrix
omega = rand(LKJ(dim, 2.0))
Sigma = Symmetric(omega)

# Data
data = rand(MvNormal(zeros(dim), Sigma), n_obs)'

# Construct model
model = ex(data)
chain = sample(model, NUTS(), 100)

Running the example above gets me

ERROR: LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
 [1] checkpositivedefinite at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:18 [inlined]
 [2] cholesky!(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float6
4},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,
1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{Na
medTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{S
et{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},Arra
y{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int
64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.S
elector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega)
,Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1
}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}, ::Val{false}; check::Boo
l) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:219
 [3] #cholesky#130 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined]
 [4] cholesky at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:344 [inlined] (repeats 2 times)
 [5] (::Bijectors.CorrBijector)(::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{In
verseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},
1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{Dyna
micPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{F
loat64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64}
,Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tup
le{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array
{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarInfo{NamedTuple
{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{Dynam
icPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10},2}}) at /hom
e/cameron/.julia/dev/Bijectors/src/bijectors/corr.jl:68
 [6] logabsdetjac at /home/cameron/.julia/packages/Bijectors/Fye2h/src/bijectors/corr.jl:101 [inlined]
 [7] logpdf_with_trans(::LKJ{Float64,Int64}, ::Symmetric{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillA
rrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:om
ega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.Samp
lerState{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{
}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultCont
ext},Float64},Float64,10},Array{ForwardDiff.Dual{ForwardDiff.Tag{Turing.Core.var"#f#7"{DynamicPPL.VarInfo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{F
loat64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Flo
at64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},DynamicPPL.Model{var"#13#14",(:data,),(),(),Tuple{Adjoint{Float64,Array{Float64,2}}},Tuple{}},DynamicPPL.Sampler{NUTS{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.DiagEuclideanMetric},Turing.Inference.SamplerState{DynamicPPL.VarIn
fo{NamedTuple{(:sigma, :omega),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:sigma,Tuple{}},Int64},Array{Product{Continuous,InverseGamma{Float64},FillArrays.Fill{InverseGamma{Float64},1,Tuple{Base.OneTo{Int64}}}},1},Array{DynamicPPL.VarName{:sigma,Tuple{}},1},Array{Float64,1},Ar
ray{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:omega,Tuple{}},Int64},Array{LKJ{Float64,Int64},1},Array{DynamicPPL.VarName{:omega,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}},DynamicPPL.DefaultContext},Float64},Float64,10}
,2}}, ::Bool) at /home/cameron/.julia/dev/Bijectors/src/Bijectors.jl:130

If I reduce dims to 3, sampling basically just hangs forever. Occasionally it'll throw a singular exception, but I haven't been able to recreate it and so I don't have a callstack.

Any thoughts here?

cpfiffer avatar Oct 30 '20 01:10 cpfiffer

I don't think there's anything wrong here, it's just that it's numerically very unstable. If I run the code with an inserted print statement in the definition of the inverse-evaluation of CorrBijector, I get the following as an input (i.e. the value in the unconstrained space that HMC is working with):

10×10 Array{Float64,2}:
 0.0  -1630.14  -857.317  -8051.24  …    -825.602   -1598.03   -1801.29
 0.0      0.0   8426.06   25379.5       -5603.79    -6840.33   -3651.21
 0.0      0.0      0.0    30092.8       -2565.67    -4733.44     -56.0023
 0.0      0.0      0.0        0.0       -7583.1    -10285.7     8297.9
 0.0      0.0      0.0        0.0       -4432.82    -6084.45    6599.2
 0.0      0.0      0.0        0.0   …  -23834.4    -31462.6   -55327.8
 0.0      0.0      0.0        0.0      -33831.2    -42291.8    22720.5
 0.0      0.0      0.0        0.0           0.0     42018.5    13520.2
 0.0      0.0      0.0        0.0           0.0         0.0    32250.5
 0.0      0.0      0.0        0.0           0.0         0.0        0.0

Now, that doesn't look too pretty. The "transformed" version (from unconstrained to constrained) is:

10×10 Array{Float64,2}:
  1.0  -1.0  -1.0  -1.0  -1.0  -1.0   1.0  -1.0  -1.0  -1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
  1.0  -1.0  -1.0  -1.0  -1.0  -1.0   1.0  -1.0  -1.0  -1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0
 -1.0   1.0   1.0   1.0   1.0   1.0  -1.0   1.0   1.0   1.0

I need to look more into the LKJ distribution to see how we can deal with this though. Possibly related: https://mc-stan.org/docs/2_18/functions-reference/cholesky-lkj-correlation-distribution.html

torfjelde avatar Oct 30 '20 09:10 torfjelde

Actually, I wonder if this is just an initilization problem. Like if we tilted the initialization towards regions of the unconstrained space that actually mapped to meaningfully differentiable correlation matrices.

Edit: bad solution, I know, because then it's a different density.

cpfiffer avatar Oct 30 '20 16:10 cpfiffer

No, this is fine. This is essentially what we do in Turing wit the "robust" init-methods. But this is more of an issue with Turing rather than Bijectors, no?

EDIT: As in, it's fine to change the initialization procedure, but not the actual distribution, yeah.

torfjelde avatar Nov 01 '20 13:11 torfjelde

Possibly also related: https://github.com/TuringLang/Bijectors.jl/issues/134

devmotion avatar Nov 01 '20 14:11 devmotion

No, this is fine. This is essentially what we do in Turing wit the "robust" init-methods. But this is more of an issue with Turing rather than Bijectors, no?

Yeah, perhaps, though the real issue might be the numerical instability. If the "true" bijector exists and is really numerically stable, then it doesn't matter how it's initialized.

cpfiffer avatar Nov 01 '20 17:11 cpfiffer

Just a data point that may be relevant to this issue - I noticed that LKJ seems to never work with NUTS (giving the above posdef error), but it does seems to work just fine with HMC. E.g.,

using Turing

@model function LKJ_demo(P)
    y ~ LKJ(P, 1)
end
model = LKJ_demo(5)

m1 = sample(model, HMC(0.01, 20), MCMCThreads(), 1, 4) # Works
m2 = sample(model, NUTS(1, 0.65), MCMCThreads(), 1, 4) # Always fails

bratslavia avatar May 04 '21 13:05 bratslavia

@bratslavia This is because NUTS has an adaptation phase where it will try out possibly "extreme" step-sizes, while HMC is using the fixed-step size of 0.01 that you provide. As a result, numerical issues are more likely to show up when using NUTS rather than HMC (unless you specify a very large step-size).

torfjelde avatar May 05 '21 14:05 torfjelde

So what is the current suggested workaround, if any? Bratslavia's MWE is indeed perfectly indicative of what I have been experiencing, that, at least as long as using NUTS then probably, it is very likely to run into an error that will basically stop all sampling.

mgmverburg avatar Aug 11 '21 13:08 mgmverburg