Turing.jl icon indicating copy to clipboard operation
Turing.jl copied to clipboard

Roadmap to LKJCholesky

Open PavanChaggar opened this issue 3 years ago • 1 comments

Making an issue to document the current issues hindering efficient LKJCholesky sampling and try to form a concrete path toward an implementation.

@storopoli has a nice example here with a work around and an ideal Turing implementation. I'll work on adapting this into a MWE that can be used for testing.

The LKJCholesky distribution has now been implemented in Distributions.jl (here, we just need to clean up how it interacts with Turing. There are two main issues:

  • Random samples from a LKJCholesky distribution return a LinearAlgebra.Cholesky type, not just an upper/lower triangular Array. From previous discussions, it sounds like Bijectors.jl needs the same input/ouput type of a Bijector. At the moment, the current CorrBijector maps to a correlation matrix but it would be easier to map to a Cholesky matrix.
  • There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains. @torfjelde @yebai @devmotion are there any other particular methods that would need to be implemented on a distribution type for Turing?

cc'ing @sethaxen, with whom I've briefly chatted about this.

Immediate next steps will be to work on a MWE to centre a PR around. However, if someone already has one, please post!

PavanChaggar avatar Aug 05 '22 15:08 PavanChaggar

@PavanChaggar Great to see more steps in this direction! I put together a LKJCholesky MWE just the other day and posted it to Discourse because I was running into some issues during sampling. @sethaxen answered my questions and it may be a good example for dealing with divergences and thus zero values in the diagonal of the Cholesky factor.

joshualeond avatar Aug 08 '22 22:08 joshualeond

@PavanChaggar Any updates on this?

yebai avatar Nov 12 '22 20:11 yebai

There are some internals of Turing that will need to be made compatible with the Cholesky type. One known example is packaging samples into MCMCChains.

If we switch to using InferenceData and InferenceObjects.jl, I believe this should get a lot easier. @PavanChaggar how long do you think it would take to get everything besides that working?

ParadaCarleton avatar Nov 24 '22 01:11 ParadaCarleton

Hello! Just wanted to ask if there has been any progress on this?

SebastianCallh avatar Mar 15 '23 18:03 SebastianCallh

Once https://github.com/TuringLang/Bijectors.jl/pull/246 and https://github.com/TuringLang/DynamicPPL.jl/pull/462 are merged, this should be supported.

cc @torfjelde

yebai avatar Mar 15 '23 19:03 yebai

Hey @yebai and @torfjelde, just checking in to see if this is implemented in Turing? I see those two PRs are merged but wasn't sure if that meant it's complete or not. If so, are there any docs or examples to play with related to this?

joshualeond avatar Aug 13 '23 03:08 joshualeond

This is the PR to follow: https://github.com/TuringLang/Turing.jl/pull/2018

Once that has gone through, we're there:) And we're getting closer!

torfjelde avatar Aug 13 '23 10:08 torfjelde

This should be working now.

yebai avatar Aug 16 '23 21:08 yebai

Thanks so much @yebai for you and the Turing team's effort! I see adding support for this has not been a trivial task.

Unfortunately, I'm personally still having problems using the LKJCholesky distribution in Turing. I posted on Julia Discourse about it but unsure what the best place is to discuss. I'm assuming it isn't Github though. Could just be user error on my part.

joshualeond avatar Aug 17 '23 14:08 joshualeond

@torfjelde fixed a few minor leftover issues, notably https://github.com/TuringLang/DynamicPPL.jl/pull/521 and https://github.com/TuringLang/DynamicPPL.jl/pull/531. @joshualeond can you try again using [email protected]?

yebai avatar Sep 04 '23 22:09 yebai

Hey @yebai and @torfjelde, yes it looks good on my end:

using LinearAlgebra, PDMats, Random, Turing

Random.seed!(121)

# generate data
sigma = [1,2,3]
Omega = [1 0.3 0.2;
         0.3 1 0.1;
         0.2 0.1 1]

Sigma = Diagonal(sigma) * Omega * Diagonal(sigma)
N = 100
J = 3

y = rand(MvNormal(Sigma), N)'

@model function correlation_chol(J, N, y)
    sigma ~ filldist(truncated(Cauchy(0., 5.); lower = 0), J)
    F ~ LKJCholesky(J, 1.0)
    Σ_L = Diagonal(sigma) * F.L
    Sigma = PDMat(Cholesky(Σ_L + eps() * I))
    for i in 1:N
        y[i,:] ~ MvNormal(Sigma)
    end
    return (;Omega = Matrix(F))
end


chain1 = sample(correlation_chol(J, N, y), NUTS(), 1000);

Results in the following below. I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

Thanks again for all the hard work on this! Giving much of your personal time to bring new features to Julia is appreciated so thanks.

julia> chain1
Chains MCMC chain (1000×21×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 2.01 seconds
Compute duration  = 2.01 seconds
parameters        = sigma[1], sigma[2], sigma[3], F.L[1,1], F.L[2,1], F.L[3,1], F.L[2,2], F.L[3,2], F.L[3,3]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk   ess_tail      rhat   ess_per_sec 
      Symbol   Float64   Float64   Float64     Float64    Float64   Float64       Float64 

    sigma[1]    1.0154    0.0742    0.0026    840.1914   756.3558    0.9993      417.3827
    sigma[2]    2.2811    0.1633    0.0058    787.8339   809.2130    1.0005      391.3730
    sigma[3]    3.0124    0.2217    0.0070   1031.3962   633.4660    1.0016      512.3677
    F.L[1,1]    1.0000    0.0000       NaN         NaN        NaN       NaN           NaN
    F.L[2,1]    0.5011    0.0781    0.0029    704.6972   605.2379    0.9994      350.0731
    F.L[3,1]    0.2197    0.0884    0.0030    879.5019   843.2654    0.9996      436.9110
    F.L[2,2]    0.8607    0.0454    0.0017    704.6972   605.2379    0.9993      350.0731
    F.L[3,2]   -0.0145    0.0961    0.0027   1211.5411   721.9466    0.9996      601.8585
    F.L[3,3]    0.9664    0.0221    0.0008    852.8859   809.2761    0.9993      423.6890

joshualeond avatar Sep 05 '23 13:09 joshualeond

Glad it's working!

I'm attempting to get that Omega matrix reconstructed with the posterior to compare it to the original Stan post I was attempting to replicate but I just need to get more familiar with Turing.

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

torfjelde avatar Sep 05 '23 13:09 torfjelde

For this you can use generated_quantities(correlation_chol(J, N, y), chain1) :)

@torfjelde will this work as expected, given that we renamed variables from F to F.L for clarity in the returned chains? I saw a lot of warnings while running this example.

yebai avatar Sep 05 '23 14:09 yebai

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

torfjelde avatar Sep 05 '23 14:09 torfjelde

Ah true, no you're right, this won't work as intended. Hmm, will have to think a bit about that.

After https://github.com/TuringLang/Turing.jl/pull/2078, generated_quantities(correlation_chol(J, N, y), chain1) is now working properly on the master branch.

yebai avatar Sep 12 '23 20:09 yebai

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

joshualeond avatar Sep 12 '23 20:09 joshualeond

@joshualeond could you say what version of turing are you using? When I try your example I get an error

ERROR: MethodError: no method matching vectorize(::LKJCholesky{Float64}, ::Cholesky{Float64, Matrix{Float64}})

ah sorry!! I see now, I just needed to update to 0.29 , please ignore me.

YSanchezAraujo avatar Sep 14 '23 14:09 YSanchezAraujo

Nice! Just checked it out and looks good on my end as well. Thanks again for the huge effort!

Hi, I am on Turing.jl 0.29.1 and the above code does not run for me. It throws the following error: ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed. Any help would be appreciated.

datadreamscorp avatar Sep 19 '23 22:09 datadreamscorp

I also get the above ^ but it seems to happen at random.

YSanchezAraujo avatar Sep 20 '23 00:09 YSanchezAraujo

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

This is a different error than what @YSanchezAraujo first mentioned though; this might just be a numerical issue :confused:

Need to look more into this a bit later

torfjelde avatar Sep 20 '23 13:09 torfjelde

Sadly I have not been able to replicate the above example after several tries :( not even at random. The one time that I did not get the PosDefException, the model sampled and then proceeded to throw the following error: ERROR: MethodError: no method matching length(::Cholesky{Float64, Matrix{Float64}})

Really appreciate the work that is being put into this. It's a much-needed implementation. Hope it can be fixed soon.

datadreamscorp avatar Sep 20 '23 23:09 datadreamscorp

This seems relevant: https://github.com/TuringLang/Turing.jl/issues/2081

In particular, this comment: https://github.com/TuringLang/Turing.jl/issues/2081#issuecomment-1733989162

torfjelde avatar Sep 25 '23 16:09 torfjelde

it seems like the basic advice is to add some small value along the diagonal of Sigma, but how big that should be will differ from case to case. it makes me wonder if empirically it can be shown that the worse the model assumptions are for the data, the more likely you are to run into this problem. basically if you need to add too large of a value along the diagonal, just get rid of the attempt to model that correlation matrix all together.

YSanchezAraujo avatar Sep 27 '23 17:09 YSanchezAraujo