math icon indicating copy to clipboard operation
math copied to clipboard

Multi normal derivatives

Open spinkney opened this issue 4 years ago • 14 comments

Summary

This shows that multi_normal should just take the cholesky decomposition and call multi_normal_cholesky.

I just copied all the multi_normal_cholesky_lpdf code over. I wasn't sure how to just call it. That should be updated. The unit tests that call multi_normal_log fail.

Question: Would it be better to form the cholesky decomposition using ldlt instead of cholesky_decompose()?

Tests

No new tests.

Here's a benchmark vs the current multi_normal_lpdf

image

Side Effects

None

Release notes

multi_normal_lpdf now takes the Cholesky decomposition of the covariance matrix and calls multi_normal_cholesky_lpdf under the hood

Checklist

  • [ ] Math issue #2544

  • [x] Copyright holder: Sean Pinkney

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • [x] the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • [x] the code is written in idiomatic C++ and changes are documented in the doxygen

  • [x] the new changes are tested

spinkney avatar Aug 29 '21 14:08 spinkney

@SteveBronder or @bob-carpenter , everything is working well except that the expression_tests fail for multi_normal_log.

I assume that if the expression_tests got to multi_normal_lpdf it would fail also. However, if I replace those with the test in multi_normal_cholesky_lpdf (but calling `multi_normal_lpdf' at the end of the test) then the test pass.

I think the issue is that there are some legacy signatures in multi_normal_log which just calls multi_normal_lpdf. So should we add those to `multi_normal_cholesky'? Or can we remove them?

spinkney avatar Sep 05 '21 13:09 spinkney

The one issue is the failing expression test. The issue is that the multi normal is testing the expression vector, array[] vector, matrix whereas the cholesky version is testing array[] vector, array[] vector, matrix. So now that the multi_normal calls the cholesky version, it is failing.

@rok-cesnovar is this a stanc required change?

spinkney avatar Sep 09 '21 13:09 spinkney

The expressions are now handled properly. @bob-carpenter would you have time to review this? You definitely know more about these multi normal lpdf than me.

rok-cesnovar avatar Sep 10 '21 06:09 rok-cesnovar


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.47 3.39 1.02 2.23% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.25% faster
eight_schools/eight_schools.stan 0.09 0.09 0.99 -1.39% slower
gp_regr/gp_regr.stan 0.14 0.14 0.99 -0.98% slower
irt_2pl/irt_2pl.stan 5.11 5.13 1.0 -0.4% slower
performance.compilation 91.01 89.05 1.02 2.16% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.11 8.16 0.99 -0.52% slower
pkpd/one_comp_mm_elim_abs.stan 31.92 29.9 1.07 6.34% faster
sir/sir.stan 118.97 120.89 0.98 -1.61% slower
gp_regr/gen_gp_data.stan 0.03 0.03 0.99 -1.34% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.21 3.01 1.07 6.19% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.39 0.95 -5.15% slower
arK/arK.stan 2.03 2.03 1.0 -0.14% slower
arma/arma.stan 0.25 0.25 1.0 -0.1% slower
garch/garch.stan 0.61 0.6 1.02 1.75% faster
Mean result: 1.00571737881

Jenkins Console Log Blue Ocean Commit hash: 18f82c41c53586e354649e8b21e9324e13c3cbed


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix

Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix

stan-buildbot avatar Sep 10 '21 07:09 stan-buildbot

This is ready, @bob-carpenter will you have time to look prior to the feature freeze?

spinkney avatar Sep 16 '21 12:09 spinkney

I'm heading out of town on vacation for a few days, so I won't be able to review until next Tuesday at the earliest.

bob-carpenter avatar Sep 16 '21 12:09 bob-carpenter

No worries, enjoy your vacation!

spinkney avatar Sep 16 '21 12:09 spinkney

@spinkney mind merging in develop here? We should get this in for the next release.

rok-cesnovar avatar Dec 20 '21 07:12 rok-cesnovar

@rok-cesnovar can you restart the CI/jenkins process? Once that finishes I can merge in

spinkney avatar Dec 20 '21 11:12 spinkney

@rok-cesnovar (or @bob-carpenter or @bgoodri) I updated this to use the ldlt decom rather than the cholesky as it's more numerically stable. I attempted to keep the ldlt factor type and use it in the mdivide_*_ldlt for the derivatives but I was getting a bunch of type errors. I resorted to creating the cholesky factor from the L and sqrt(D) multiplication.

spinkney avatar Apr 19 '22 19:04 spinkney

Taking the square root is where most of the precision gained from using LDLT gets lost. There should be a way to differentiate with respect to the diagonal elements of D directly.

bgoodri avatar Apr 19 '22 20:04 bgoodri

Taking the square root is where most of the precision gained from using LDLT gets lost. There should be a way to differentiate with respect to the diagonal elements of D directly.

Ok, then I may as well have kept it simple by taking the cholesky decomposition and passing that to multi_normal_cholesky_lpdf. The issue with that is the numerical precision of the cholesky decomp is sometimes going to fail and users will have to add a jitter to the diagonal.

The way I think this should be done is to create a multi_normal_ldlt_lpdf() that doesn't get exposed outside of stan-math. Autodiff through the ldlt then pass that into the multi_normal_ldlt function and utilize the LDLT functions to get the derivatives. My one worry is that we need the inverse of the cholesky factor L. Can we just use mdivide_left_ldlt(Sigma_ldlt) for this? Or will this introduce the sqrt problem all over again?

spinkney avatar Apr 20 '22 15:04 spinkney

@spinkney, I didn't realize this was a draft. Let me know what you think.

syclik avatar Aug 18 '22 14:08 syclik

The issue is that the cholesky decomposition is less stable than the ldlt so I'm waiting until tuples get in the language or once someone implements ldlt with derivatives.

spinkney avatar Aug 18 '22 14:08 spinkney