Don't do inverse metric decomposition every draw
Fix for Issue #2881 .
This involved switching to setter/getters for interfacing with dense_e_point so I made the change for diag_e_point as well.
I also changed all the set_metric verbage to set_inv_metric.
I didn't know what tests needed added for this. Lemme know.
Submission Checklist
- [x] Run unit tests:
./runTests.py src/test/unit - [x] Run cpplint:
make cpplint - [x] Declare copyright holder and open-source license: see below
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
The current design is intentional.
The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.
At the same time the members of ps_point and its derivations are intentionally left public to avoid any code overhead for accessors and mutators. Because these classes are not exposed through the algorithm API and everything is public we decided that there isn't any need for encapsulation. If encapsulation were added then the accessors and mutators would need to be unit tested as is done elsewhere in the library.
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.19 | 3.21 | 0.99 | -0.58% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 1.0 | -0.23% slower |
| eight_schools/eight_schools.stan | 0.12 | 0.09 | 1.27 | 21.05% faster |
| gp_regr/gp_regr.stan | 0.17 | 0.18 | 0.99 | -1.31% slower |
| irt_2pl/irt_2pl.stan | 5.69 | 5.8 | 0.98 | -1.89% slower |
| performance.compilation | 91.35 | 88.51 | 1.03 | 3.11% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.46 | 9.02 | 0.94 | -6.69% slower |
| pkpd/one_comp_mm_elim_abs.stan | 29.44 | 29.83 | 0.99 | -1.33% slower |
| sir/sir.stan | 133.71 | 134.61 | 0.99 | -0.67% slower |
| gp_regr/gen_gp_data.stan | 0.04 | 0.05 | 0.98 | -2.21% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.96 | 3.08 | 0.96 | -3.97% slower |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.4 | 0.41 | 0.97 | -3.18% slower |
| arK/arK.stan | 1.81 | 2.67 | 0.68 | -47.53% slower |
| arma/arma.stan | 0.63 | 0.86 | 0.74 | -36.05% slower |
| garch/garch.stan | 0.7 | 0.65 | 1.07 | 6.43% faster |
| Mean result: 0.971182373984 |
Jenkins Console Log Blue Ocean Commit hash: a67e9e5f9100278c3f78787403be4b980ec403be
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.12 | 3.17 | 0.98 | -1.65% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 1.0 | 0.31% faster |
| eight_schools/eight_schools.stan | 0.12 | 0.12 | 0.99 | -1.13% slower |
| gp_regr/gp_regr.stan | 0.17 | 0.17 | 1.01 | 0.54% faster |
| irt_2pl/irt_2pl.stan | 5.7 | 5.7 | 1.0 | -0.0% slower |
| performance.compilation | 90.54 | 88.68 | 1.02 | 2.06% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.45 | 8.42 | 1.0 | 0.27% faster |
| pkpd/one_comp_mm_elim_abs.stan | 30.54 | 28.12 | 1.09 | 7.91% faster |
| sir/sir.stan | 128.64 | 131.17 | 0.98 | -1.97% slower |
| gp_regr/gen_gp_data.stan | 0.05 | 0.04 | 1.07 | 6.27% faster |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.96 | 2.95 | 1.0 | 0.43% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.4 | 0.37 | 1.08 | 7.34% faster |
| arK/arK.stan | 1.8 | 1.78 | 1.01 | 1.07% faster |
| arma/arma.stan | 0.61 | 0.74 | 0.83 | -20.48% slower |
| garch/garch.stan | 0.67 | 0.55 | 1.22 | 17.77% faster |
| Mean result: 1.01859530072 |
Jenkins Console Log Blue Ocean Commit hash: a8e7500d483c98ee0c4da8a9b68a76287f0d93ae
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.54 | 3.62 | 0.98 | -2.16% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 1.05 | 4.57% faster |
| eight_schools/eight_schools.stan | 0.12 | 0.12 | 1.0 | 0.18% faster |
| gp_regr/gp_regr.stan | 0.17 | 0.17 | 0.98 | -1.9% slower |
| irt_2pl/irt_2pl.stan | 5.8 | 5.74 | 1.01 | 0.99% faster |
| performance.compilation | 86.89 | 86.1 | 1.01 | 0.91% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.55 | 8.44 | 1.01 | 1.28% faster |
| pkpd/one_comp_mm_elim_abs.stan | 30.82 | 29.5 | 1.04 | 4.27% faster |
| sir/sir.stan | 129.2 | 135.34 | 0.95 | -4.75% slower |
| gp_regr/gen_gp_data.stan | 0.05 | 0.04 | 1.03 | 3.27% faster |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.0 | 2.93 | 1.02 | 2.41% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.37 | 0.4 | 0.91 | -9.98% slower |
| arK/arK.stan | 2.47 | 1.8 | 1.37 | 27.04% faster |
| arma/arma.stan | 0.6 | 0.6 | 1.0 | -0.41% slower |
| garch/garch.stan | 0.76 | 0.66 | 1.16 | 13.63% faster |
| Mean result: 1.03555431436 |
Jenkins Console Log Blue Ocean Commit hash: c4f30f6cfb49c1929ecc2a0c9264b36e8755cec8
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.46 | 3.41 | 1.01 | 1.43% faster |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.99 | -1.26% slower |
| eight_schools/eight_schools.stan | 0.11 | 0.11 | 1.0 | -0.5% slower |
| gp_regr/gp_regr.stan | 0.15 | 0.15 | 1.01 | 1.42% faster |
| irt_2pl/irt_2pl.stan | 5.24 | 5.29 | 0.99 | -1.09% slower |
| performance.compilation | 89.84 | 89.14 | 1.01 | 0.79% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 7.81 | 7.78 | 1.0 | 0.36% faster |
| pkpd/one_comp_mm_elim_abs.stan | 29.7 | 28.98 | 1.02 | 2.43% faster |
| sir/sir.stan | 129.8 | 131.11 | 0.99 | -1.01% slower |
| gp_regr/gen_gp_data.stan | 0.04 | 0.04 | 0.98 | -2.28% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.01 | 3.0 | 1.0 | 0.27% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.39 | 0.39 | 1.0 | 0.15% faster |
| arK/arK.stan | 2.48 | 1.8 | 1.38 | 27.42% faster |
| arma/arma.stan | 0.72 | 0.61 | 1.17 | 14.77% faster |
| garch/garch.stan | 0.65 | 0.59 | 1.09 | 8.49% faster |
| Mean result: 1.04354280135 |
Jenkins Console Log Blue Ocean Commit hash: c4f30f6cfb49c1929ecc2a0c9264b36e8755cec8
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.
Thinking about this again, don't you pay the O(N^2) memory burden still computing the cholesky but it's just temporary data? If a program failed because of N is too large it would fail either way wouldn't it?
don't you pay the O(N^2) memory burden still
Oh yeah, the temporary would still be O(N^2) since you compute the Cholesky at one point or another. Good point. I'll add the forwarding real quick
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.44 | 3.44 | 1.0 | -0.11% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.97 | -2.88% slower |
| eight_schools/eight_schools.stan | 0.11 | 0.11 | 1.01 | 0.6% faster |
| gp_regr/gp_regr.stan | 0.16 | 0.16 | 0.98 | -2.41% slower |
| irt_2pl/irt_2pl.stan | 5.23 | 5.34 | 0.98 | -2.11% slower |
| performance.compilation | 90.25 | 88.92 | 1.01 | 1.47% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.92 | 8.86 | 1.01 | 0.64% faster |
| pkpd/one_comp_mm_elim_abs.stan | 29.33 | 30.56 | 0.96 | -4.18% slower |
| sir/sir.stan | 130.71 | 130.01 | 1.01 | 0.54% faster |
| gp_regr/gen_gp_data.stan | 0.03 | 0.04 | 0.91 | -10.4% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.12 | 3.12 | 1.0 | -0.03% slower |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.37 | 0.39 | 0.96 | -4.28% slower |
| arK/arK.stan | 1.9 | 1.88 | 1.01 | 0.88% faster |
| arma/arma.stan | 0.63 | 0.95 | 0.67 | -49.87% slower |
| garch/garch.stan | 0.53 | 0.64 | 0.83 | -21.14% slower |
| Mean result: 0.952369574099 |
Jenkins Console Log Blue Ocean Commit hash: 7072224fa8b0459793d6e2d985ee4c99690c94dd
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.36 | 3.38 | 1.0 | -0.49% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.99 | -1.25% slower |
| eight_schools/eight_schools.stan | 0.12 | 0.12 | 1.01 | 1.32% faster |
| gp_regr/gp_regr.stan | 0.16 | 0.16 | 1.01 | 0.54% faster |
| irt_2pl/irt_2pl.stan | 6.03 | 5.95 | 1.01 | 1.22% faster |
| performance.compilation | 91.54 | 89.45 | 1.02 | 2.28% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 9.53 | 8.6 | 1.11 | 9.67% faster |
| pkpd/one_comp_mm_elim_abs.stan | 28.75 | 30.51 | 0.94 | -6.13% slower |
| sir/sir.stan | 122.72 | 125.39 | 0.98 | -2.18% slower |
| gp_regr/gen_gp_data.stan | 0.03 | 0.03 | 1.0 | -0.44% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.03 | 3.05 | 1.0 | -0.48% slower |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.4 | 0.38 | 1.03 | 2.62% faster |
| arK/arK.stan | 1.87 | 2.52 | 0.74 | -34.96% slower |
| arma/arma.stan | 0.87 | 0.78 | 1.11 | 9.59% faster |
| garch/garch.stan | 0.62 | 0.73 | 0.85 | -17.14% slower |
| Mean result: 0.985575872844 |
Jenkins Console Log Blue Ocean Commit hash: 9093b0f557ddce1f1bedc8b52ebacb723fd7998c
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix