stan
stan copied to clipboard
Feature/issue 2814 warmup auto
Submission Checklist
- [ ] Run unit tests:
./runTests.py src/test/unit - [ ] Run cpplint:
make cpplint - [x] Declare copyright holder and open-source license: see below
Summary
The goals/reasons are outlined here: https://github.com/stan-dev/stan/issues/2814
There'll be a CmdStan pull to go with this (and this probably shouldn't go in until that pull is good too).
This adds another metric that the samplers use to compute all their gradients and such: src/stan/mcmc/hmc/hamiltonians/auto_e_metric.hpp, and src/stan/mcmc/hmc/hamiltonians/auto_e_point.hpp
And then an adaptation routine to actually compute that metric: src/stan/mcmc/auto_adaptation.hpp
Edit: To review this pull request you'll want to pull this version of cmdstan and at least try out the adaptation on a couple models: https://github.com/stan-dev/cmdstan/pull/729
Intended Effect
The adaptation routine updates the metric and tells it whether to act like a dense or diagonal metric at the end of each warmup stage where the metric is recomputed.
How to Verify
The new tests can be run with: ./runTests.py src/test/unit/mcmc/auto_adaptation_learn_covariance_pick_dense_test ./runTests.py src/test/unit/mcmc/auto_adaptation_learn_covariance_pick_diag_test
and
./runTests.py src/test/unit/mcmc/auto_adaptation_test
Side Effects
Hopefully none
Documentation
Yet to be written
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
Before consideration this is going to require a good bit of empirical validation more than what's in the arXiv paper, especially with regard to varying dimensions and curvatures. To be open I am hesitant about the robustness of automatically switching between diagonal and dense given how small the early windows are, and how noisy those off-diagonal estimates are (not to mention the eigenvalue approximations). For something like this goes to go in it will have to be verified to work properly for diagonally-dominant problems, dense-dominated problems, and everything in between without a significant increase in cost.
Keep in mind that we're trying to minimize the sampler variants in the code base and not have "experimental" versions on dev/master. At some point we'll clean up the other samplers in there.
Before consideration this is going to require a good bit of empirical validation more than what's in the arXiv paper, especially with regard to varying dimensions and curvatures. To be open I am hesitant about the robustness of automatically switching between diagonal and dense given how small the early windows are, and how noisy those off-diagonal estimates are (not to mention the eigenvalue approximations). For something like this goes to go in it will have to be verified to work properly for diagonally-dominant problems, dense-dominated problems, and everything in between without a significant increase in cost.
Yup, hopefully we can find some models that break and learn stuff!
In general I recommend working through the validation before creating a pull request and using up testing resources. Instead a branch can be discussed on Discourse.
Because of how this proposal modifies warmup it will require studying, at the very least,
- sensitivity to initial conditions
- sensitivity to heavy tails
- sensitivity to dimension
- warmup time
- models with spatially-varying covariances
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.3 | 3.07 | 1.07 | 6.93% faster |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.97 | -2.74% slower |
| eight_schools/eight_schools.stan | 0.12 | 0.12 | 1.03 | 3.17% faster |
| gp_regr/gp_regr.stan | 0.18 | 0.17 | 1.0 | 0.48% faster |
| irt_2pl/irt_2pl.stan | 5.68 | 5.72 | 0.99 | -0.79% slower |
| performance.compilation | 91.0 | 88.12 | 1.03 | 3.17% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.47 | 8.49 | 1.0 | -0.3% slower |
| pkpd/one_comp_mm_elim_abs.stan | 29.41 | 29.13 | 1.01 | 0.95% faster |
| sir/sir.stan | 131.67 | 125.59 | 1.05 | 4.62% faster |
| gp_regr/gen_gp_data.stan | 0.04 | 0.04 | 1.02 | 2.2% faster |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.96 | 2.95 | 1.0 | 0.32% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.39 | 0.4 | 0.97 | -3.47% slower |
| arK/arK.stan | 1.81 | 1.79 | 1.01 | 0.75% faster |
| arma/arma.stan | 0.62 | 0.75 | 0.83 | -20.86% slower |
| garch/garch.stan | 0.7 | 0.55 | 1.27 | 21.07% faster |
| Mean result: 1.01729364403 |
Jenkins Console Log Blue Ocean Commit hash: 473a28cfe94207fcbeefb44da8a3053719d57f7e
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.5 | 3.52 | 1.0 | -0.38% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.97 | -3.2% slower |
| eight_schools/eight_schools.stan | 0.12 | 0.12 | 1.0 | -0.2% slower |
| gp_regr/gp_regr.stan | 0.17 | 0.17 | 1.01 | 1.34% faster |
| irt_2pl/irt_2pl.stan | 5.72 | 5.68 | 1.01 | 0.79% faster |
| performance.compilation | 87.01 | 85.55 | 1.02 | 1.68% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.44 | 8.56 | 0.99 | -1.44% slower |
| pkpd/one_comp_mm_elim_abs.stan | 30.44 | 29.2 | 1.04 | 4.09% faster |
| sir/sir.stan | 127.13 | 125.91 | 1.01 | 0.96% faster |
| gp_regr/gen_gp_data.stan | 0.04 | 0.04 | 1.0 | 0.09% faster |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 2.97 | 2.93 | 1.01 | 1.27% faster |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.4 | 0.39 | 1.0 | 0.23% faster |
| arK/arK.stan | 2.48 | 2.48 | 1.0 | -0.13% slower |
| arma/arma.stan | 0.61 | 0.61 | 1.0 | -0.38% slower |
| garch/garch.stan | 0.74 | 0.75 | 0.99 | -1.04% slower |
| Mean result: 1.00270568617 |
Jenkins Console Log Blue Ocean Commit hash: da0016643e2d1c20c29980752614a02d3d7a619e
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix