Multi normal derivatives
Summary
This shows that multi_normal should just take the cholesky decomposition and call multi_normal_cholesky.
I just copied all the multi_normal_cholesky_lpdf code over. I wasn't sure how to just call it. That should be updated. The unit tests that call multi_normal_log fail.
Question: Would it be better to form the cholesky decomposition using ldlt instead of cholesky_decompose()?
Tests
No new tests.
Here's a benchmark vs the current multi_normal_lpdf

Side Effects
None
Release notes
multi_normal_lpdf now takes the Cholesky decomposition of the covariance matrix and calls multi_normal_cholesky_lpdf under the hood
Checklist
-
[ ] Math issue #2544
-
[x] Copyright holder: Sean Pinkney
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
-
[x] the basic tests are passing
- unit tests pass (to run, use:
./runTests.py test/unit) - header checks pass, (
make test-headers) - dependencies checks pass, (
make test-math-dependencies) - docs build, (
make doxygen) - code passes the built in C++ standards checks (
make cpplint)
- unit tests pass (to run, use:
-
[x] the code is written in idiomatic C++ and changes are documented in the doxygen
-
[x] the new changes are tested
@SteveBronder or @bob-carpenter , everything is working well except that the expression_tests fail for multi_normal_log.
I assume that if the expression_tests got to multi_normal_lpdf it would fail also. However, if I replace those with the test in multi_normal_cholesky_lpdf (but calling `multi_normal_lpdf' at the end of the test) then the test pass.
I think the issue is that there are some legacy signatures in multi_normal_log which just calls multi_normal_lpdf. So should we add those to `multi_normal_cholesky'? Or can we remove them?
The one issue is the failing expression test. The issue is that the multi normal is testing the expression vector, array[] vector, matrix whereas the cholesky version is testing array[] vector, array[] vector, matrix. So now that the multi_normal calls the cholesky version, it is failing.
@rok-cesnovar is this a stanc required change?
The expressions are now handled properly. @bob-carpenter would you have time to review this? You definitely know more about these multi normal lpdf than me.
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.47 | 3.39 | 1.02 | 2.23% faster |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 1.0 | 0.25% faster |
| eight_schools/eight_schools.stan | 0.09 | 0.09 | 0.99 | -1.39% slower |
| gp_regr/gp_regr.stan | 0.14 | 0.14 | 0.99 | -0.98% slower |
| irt_2pl/irt_2pl.stan | 5.11 | 5.13 | 1.0 | -0.4% slower |
| performance.compilation | 91.01 | 89.05 | 1.02 | 2.16% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.11 | 8.16 | 0.99 | -0.52% slower |
| pkpd/one_comp_mm_elim_abs.stan | 31.92 | 29.9 | 1.07 | 6.34% faster |
| sir/sir.stan | 118.97 | 120.89 | 0.98 | -1.61% slower |
| gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 0.99 | -1.34% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.21 | 3.01 | 1.07 | 6.19% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.37 | 0.39 | 0.95 | -5.15% slower |
| arK/arK.stan | 2.03 | 2.03 | 1.0 | -0.14% slower |
| arma/arma.stan | 0.25 | 0.25 | 1.0 | -0.1% slower |
| garch/garch.stan | 0.61 | 0.6 | 1.02 | 1.75% faster |
| Mean result: 1.00571737881 |
Jenkins Console Log Blue Ocean Commit hash: 18f82c41c53586e354649e8b21e9324e13c3cbed
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
This is ready, @bob-carpenter will you have time to look prior to the feature freeze?
I'm heading out of town on vacation for a few days, so I won't be able to review until next Tuesday at the earliest.
No worries, enjoy your vacation!
@spinkney mind merging in develop here? We should get this in for the next release.
@rok-cesnovar can you restart the CI/jenkins process? Once that finishes I can merge in
@rok-cesnovar (or @bob-carpenter or @bgoodri) I updated this to use the ldlt decom rather than the cholesky as it's more numerically stable. I attempted to keep the ldlt factor type and use it in the mdivide_*_ldlt for the derivatives but I was getting a bunch of type errors. I resorted to creating the cholesky factor from the L and sqrt(D) multiplication.
Taking the square root is where most of the precision gained from using LDLT gets lost. There should be a way to differentiate with respect to the diagonal elements of D directly.
Taking the square root is where most of the precision gained from using LDLT gets lost. There should be a way to differentiate with respect to the diagonal elements of D directly.
Ok, then I may as well have kept it simple by taking the cholesky decomposition and passing that to multi_normal_cholesky_lpdf. The issue with that is the numerical precision of the cholesky decomp is sometimes going to fail and users will have to add a jitter to the diagonal.
The way I think this should be done is to create a multi_normal_ldlt_lpdf() that doesn't get exposed outside of stan-math. Autodiff through the ldlt then pass that into the multi_normal_ldlt function and utilize the LDLT functions to get the derivatives. My one worry is that we need the inverse of the cholesky factor L. Can we just use mdivide_left_ldlt(Sigma_ldlt) for this? Or will this introduce the sqrt problem all over again?
@spinkney, I didn't realize this was a draft. Let me know what you think.
The issue is that the cholesky decomposition is less stable than the ldlt so I'm waiting until tuples get in the language or once someone implements ldlt with derivatives.