Turing.jl Roadmap to LKJCholesky

Making an issue to document the current issues hindering efficient LKJCholesky sampling and try to form a concrete path toward an implementation.

@storopoli has a nice example here with a work around and an ideal Turing implementation. I'll work on adapting this into a MWE that can be used for testing.

The LKJCholesky distribution has now been implemented in Distributions.jl (here, we just need to clean up how it interacts with Turing. There are two main issues:

Random samples from a LKJCholesky distribution return a LinearAlgebra.Cholesky type, not just an upper/lower triangular Array. From previous discussions, it sounds like Bijectors.jl needs the same input/ouput type of a Bijector. At the moment, the current CorrBijector maps to a correlation matrix but it would be easier to map to a Cholesky matrix.
There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains. @torfjelde @yebai @devmotion are there any other particular methods that would need to be implemented on a distribution type for Turing?

cc'ing @sethaxen, with whom I've briefly chatted about this.

Immediate next steps will be to work on a MWE to centre a PR around. However, if someone already has one, please post!

Aug 05 '22 15:08 PavanChaggar

@PavanChaggar Great to see more steps in this direction! I put together a LKJCholesky MWE just the other day and posted it to Discourse because I was running into some issues during sampling. @sethaxen answered my questions and it may be a good example for dealing with divergences and thus zero values in the diagonal of the Cholesky factor.

Aug 08 '22 22:08 joshualeond

@PavanChaggar Any updates on this?

Nov 12 '22 20:11 yebai

There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains.

If we switch to using InferenceData and InferenceObjects.jl, I believe this should get a lot easier. @PavanChaggar how long do you think it would take to get everything besides that working?

Nov 24 '22 01:11 ParadaCarleton

Hello! Just wanted to ask if there has been any progress on this?

Mar 15 '23 18:03 SebastianCallh

Once https://github.com/TuringLang/Bijectors.jl/pull/246 and https://github.com/TuringLang/DynamicPPL.jl/pull/462 are merged, this should be supported.

cc @torfjelde

Mar 15 '23 19:03 yebai

Hey @yebai and @torfjelde, just checking in to see if this is implemented in Turing? I see those two PRs are merged but wasn't sure if that meant it's complete or not. If so, are there any docs or examples to play with related to this?

Aug 13 '23 03:08 joshualeond

This is the PR to follow: https://github.com/TuringLang/Turing.jl/pull/2018

Once that has gone through, we're there:) And we're getting closer!

Aug 13 '23 10:08 torfjelde

This should be working now.

Aug 16 '23 21:08 yebai

Thanks so much @yebai for you and the Turing team's effort! I see adding support for this has not been a trivial task.

Unfortunately, I'm personally still having problems using the LKJCholesky distribution in Turing. I posted on Julia Discourse about it but unsure what the best place is to discuss. I'm assuming it isn't Github though. Could just be user error on my part.

Aug 17 '23 14:08 joshualeond

@torfjelde fixed a few minor leftover issues, notably https://github.com/TuringLang/DynamicPPL.jl/pull/521 and https://github.com/TuringLang/DynamicPPL.jl/pull/531. @joshualeond can you try again using [email protected]?

Sep 04 '23 22:09 yebai

Hey @yebai and @torfjelde, yes it looks good on my end:

using LinearAlgebra, PDMats, Random, Turing

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

@model function correlation_chol(J, N, y)
    sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
    F ~ LKJCholesky(J, 1.0)
    Σ_L = Diagonal(sigma) * F.L
    Sigma = PDMat(Cholesky(Σ_L + eps() * I))
    for i in 1:N
        y[i,:] ~ MvNormal(Sigma)
    end
    return (;Omega = Matrix(F))
end


chain1 = sample(correlation_chol(J, N, y), NUTS(), 1000);

Results in the following below. I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

Thanks again for all the hard work on this! Giving much of your personal time to bring new features to Julia is appreciated so thanks.

julia> chain1
Chains MCMC chain (1000×21×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 2.01 seconds
Compute duration  = 2.01 seconds
parameters        = sigma[1], sigma[2], sigma[3], F.L[1,1], F.L[2,1], F.L[3,1], F.L[2,2], F.L[3,2], F.L[3,3]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64    Float64   Float64       Float64 

    sigma[1]    1.0154    0.0742    0.0026    840.1914   756.3558    0.9993      417.3827
    sigma[2]    2.2811    0.1633    0.0058    787.8339   809.2130    1.0005      391.3730
    sigma[3]    3.0124    0.2217    0.0070   1031.3962   633.4660    1.0016      512.3677
    F.L[1,1]    1.0000    0.0000       NaN         NaN        NaN       NaN           NaN
    F.L[2,1]    0.5011    0.0781    0.0029    704.6972   605.2379    0.9994      350.0731
    F.L[3,1]    0.2197    0.0884    0.0030    879.5019   843.2654    0.9996      436.9110
    F.L[2,2]    0.8607    0.0454    0.0017    704.6972   605.2379    0.9993      350.0731
    F.L[3,2]   -0.0145    0.0961    0.0027   1211.5411   721.9466    0.9996      601.8585
    F.L[3,3]    0.9664    0.0221    0.0008    852.8859   809.2761    0.9993      423.6890

Sep 05 '23 13:09 joshualeond

Glad it's working!

I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

Sep 05 '23 13:09 torfjelde

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

@torfjelde will this work as expected, given that we renamed variables from F to F.L for clarity in the returned chains? I saw a lot of warnings while running this example.

Sep 05 '23 14:09 yebai

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

Sep 05 '23 14:09 torfjelde

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

After https://github.com/TuringLang/Turing.jl/pull/2078, generated_quantities(correlation_chol(J, N, y), chain1) is now working properly on the master branch.

Sep 12 '23 20:09 yebai

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

Sep 12 '23 20:09 joshualeond

@joshualeond could you say what version of turing are you using? When I try your example I get an error

ERROR: MethodError: no method matching vectorize(::LKJCholesky{Float64}, ::Cholesky{Float64, Matrix{Float64}})

ah sorry!! I see now, I just needed to update to 0.29 , please ignore me.

Sep 14 '23 14:09 YSanchezAraujo

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

Hi, I am on Turing.jl 0.29.1 and the above code does not run for me. It throws the following error: ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed. Any help would be appreciated.

Sep 19 '23 22:09 datadreamscorp

I also get the above ^ but it seems to happen at random.

Sep 20 '23 00:09 YSanchezAraujo

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

This is a different error than what @YSanchezAraujo first mentioned though; this might just be a numerical issue :confused:

Need to look more into this a bit later

Sep 20 '23 13:09 torfjelde

Sadly I have not been able to replicate the above example after several tries :( not even at random. The one time that I did not get the PosDefException, the model sampled and then proceeded to throw the following error: ERROR: MethodError: no method matching length(::Cholesky{Float64, Matrix{Float64}})

Really appreciate the work that is being put into this. It's a much-needed implementation. Hope it can be fixed soon.

Sep 20 '23 23:09 datadreamscorp

This seems relevant: https://github.com/TuringLang/Turing.jl/issues/2081

In particular, this comment: https://github.com/TuringLang/Turing.jl/issues/2081#issuecomment-1733989162

Sep 25 '23 16:09 torfjelde

it seems like the basic advice is to add some small value along the diagonal of Sigma, but how big that should be will differ from case to case. it makes me wonder if empirically it can be shown that the worse the model assumptions are for the data, the more likely you are to run into this problem. basically if you need to add too large of a value along the diagonal, just get rid of the attempt to model that correlation matrix all together.

Sep 27 '23 17:09 YSanchezAraujo

Turing.jl Turing.jl copied to clipboard

Roadmap to LKJCholesky

Turing.jl
Turing.jl copied to clipboard