cmdstanr
cmdstanr copied to clipboard
Multiple inits for multi-path Pathfinder?
Alluded to in #874, I've noticed that if you try to set initial values to the Pathfinder algorithm, in particular the multi-path variety of this algorithm, cmdstanr
currently doesn't allow the paths to begin from different initial positions. The requirement for the Pathfinder algorithm seems to be that the inits are a list containing just a single initial value. While this makes sense for the other variational algorithms, for Pathfinder this means that each L-BFGS starts from the same place, which doesn't seem ideal. Maybe this is due to some underlying restriction of cmdstan itself?
Here's an example (I've simplified some of the output to remove warnings from Stan about gradients & Pareto k values):
# Load necessary packages
library(cmdstanr)
#> This is cmdstanr version 0.7.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/corour/Documents/.cmdstan/cmdstan-2.34.0
#> - CmdStan version: 2.34.0
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
# Setup a model. This is just an example from brms.
prior1 = prior(normal(0, 10), class = b) +
prior(cauchy(0, 2), class = sd)
stan_code = make_stancode(
count ~ zAge + zBase * Trt + (1 | patient),
data = epilepsy,
family = poisson(),
prior = prior1
)
stan_data = make_standata(
count ~ zAge + zBase * Trt + (1 | patient),
data = epilepsy,
family = poisson(),
prior = prior1
)
mod = cmdstan_model(write_stan_file(stan_code))
# Create a function that generates some initial values. It will generate
# different sets of initial values depending on the value of `chain_id`
init_fn = function(chain_id) {
old_rng_state = get(".Random.seed", envir = globalenv())
on.exit(assign(".Random.seed", value = old_rng_state, envir = globalenv()))
new_seed = rngtools::RNGseq(n = chain_id + 1L, seed = 1313L)[[chain_id]]
assign(".Random.seed", value = new_seed, envir = globalenv())
b = runif(4L,-2, 2) #vector[Kc]
Intercept = runif(1L,-2, 2)
sd_1 = runif(1L, 0, 2) #vector<lower=0>[M_1]
z_1 = array(runif(1 * 59,-2, 2), dim = c(1, 59)) #array[M_1] vector[N_1]
list(
b = b,
Intercept = Intercept,
sd_1 = sd_1,
z_1 = z_1
)
}
# Create initial values for cmdstanr in the "list-of-lists" format
init_list_4 = lapply(1:4, \(i) init_fn(i))
init_list_1 = list(init_fn(1))
# Try to run Pathfinder with num_paths=4 using the "list-of-lists" style
# of setting initial values. This just produces an error.
mod_w_init_list = mod$pathfinder(
data = stan_data,
seed = 123,
init = init_list_4,
num_paths = 4,
refresh = 1000,
history_size = 10
)
#> Error: 'init' has the wrong length. See documentation of 'init' argument.
# Run Pathfinder with num_paths=4 but using just a list with a single
# initialization value. This runs but re-uses the same init for each path
mod_w_init_list = mod$pathfinder(
data = stan_data,
seed = 123,
init = init_list_1,
num_paths = 4,
refresh = 1000,
history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748
#> Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.110e+03 -1.110e+03
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851)
#> Path [2] :Initial log joint density = -16938.506748
#> Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.135e+03 -1.135e+03
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851)
#> Path [3] :Initial log joint density = -16938.506748
#> Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.245e+03 -1.245e+03
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851)
#> Path [4] :Initial log joint density = -16938.506748
#> Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.116e+03 -1.116e+03
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851)
#> Total log probability function evaluations:23304
#> Finished in 1.8 seconds.
# Run using the function-style of specifying inits. This seems to just repeat
# the single set of inits for for each of the 4 paths.
mod_w_inits = mod$pathfinder(
data = stan_data,
seed = 123,
init = init_fn,
num_paths = 4,
refresh = 1000,
history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748
#> Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.110e+03 -1.110e+03
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851)
#> Path [2] :Initial log joint density = -16938.506748
#> Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.135e+03 -1.135e+03
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851)
#> Path [3] :Initial log joint density = -16938.506748
#> Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.245e+03 -1.245e+03
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851)
#> Path [4] :Initial log joint density = -16938.506748
#> Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 194 -6.539e+02 7.973e-05 5.342e-02 1.000e+00 1.000e+00 4851 -1.116e+03 -1.116e+03
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851)
#> Total log probability function evaluations:23304
#> Finished in 1.8 seconds.
# Run using the default random inits. Different inits are used for each
# Pathfinder path.
mod_wo_inits = mod$pathfinder(
data = stan_data,
seed = 123,
num_paths = 4,
refresh = 1000,
history_size = 10
)
#> Path [1] :Initial log joint density = -5218.225137
#> Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 399 -6.539e+02 8.264e-05 4.839e-02 9.763e-01 9.763e-01 9976 -1.649e+05 -1.649e+05
#> Path [1] :Best Iter: [54] ELBO (-735.976769) evaluations: (9976)
#> Path [2] :Initial log joint density = -6622.413033
#> Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 308 -6.539e+02 8.655e-05 7.312e-02 1.000e+00 1.000e+00 7701 -1.418e+10 -1.418e+10
#> Path [2] :Best Iter: [50] ELBO (-745.757922) evaluations: (7701)
#> Path [3] :Initial log joint density = -12283.643776
#> Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 310 -6.539e+02 1.200e-04 1.092e-01 8.474e-01 8.474e-01 7751 -6.558e+04 -6.558e+04
#> Path [3] :Best Iter: [58] ELBO (-743.025505) evaluations: (7751)
#> Path [4] :Initial log joint density = -95375.955585
#> Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
#> 193 -6.539e+02 3.622e-04 8.454e-02 1.000e+00 1.000e+00 4826 -2.231e+03 -2.231e+03
#> Path [4] :Best Iter: [82] ELBO (-786.378632) evaluations: (4826)
#> Total log probability function evaluations:34154
#> Finished in 2.6 seconds.
Created on 2024-01-22 with reprex v2.0.2
R version info:
setting value
version R version 4.3.2 (2023-10-31 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/Los_Angeles
date 2024-01-22
rstudio 2023.06.1+524 Mountain Hydrangea (desktop)
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
I was also annoyed the Pathfinder expects a list and not list of lists, but I forgot to make an issue. I agree, it should be possible to give a list of lists for multi-pathfinder
I'm fixing that in the PR to have inits as pathfinder fits
Can you also allow list of lists, as for other inference methods?
Ooof okay this is going to take me a second. So I wrote it originally to use some internals that thinks parallelization is only done at the R level and that each init goes to a separate process started at the R level . I'm going to chat with @jgabry tmrw about how that all works and also what we can do to stop doing the parallelization to happen at the cmdstan level for everything since we have that now
Has this been fixed by #937 ?
@SteveBronder - just to check something with Pathfinder and inits - when multiple init files are present (one for each path), is it normal for the output to only show the first init file?
For example, with num_paths=4
and four distinct init files, the output shows:
# init = /var/folders/0h/9q9frlm906s6cwft_rp92dw80000gn/T/RtmpToGNxD/init-71038ec7ec4_1.json
Yes we only print the first one. I do think we should change that so the actual result is more clear. We could also let users pass in a comma separated list of file names which I think would be nicer