brms icon indicating copy to clipboard operation
brms copied to clipboard

Implement new fit algorithms: Laplace & Pathfinder

Open wpetry opened this issue 1 year ago • 2 comments

Carried over from a discussion on Discourse.

Implement Laplace and Pathfinder fitting methods. Laplace method is available since Stan 2.31.0 and Pathfinder since 2.33.0. I think this would limit implementation in brms to the cmdstanr backend due to the lag of RStan.

Implementation would (I think) closely match the existing variational inference support in brms (e.g., via the $variational()method for CmdStanModel objects). The analogous $laplace() and $pathfinder() methods have a lot of the same arguments, though Pathfinder has some additional tuning options that would need documentation (or reference to the existing cmdstanr docs).

wpetry avatar Jan 30 '24 13:01 wpetry

Side comment: A use case we have explored is to use pathfinder to identify initial values, so an easy way to feed pathfinder outcomes into the initial values for a new run of brms based on HMC would be ideal

fusaroli avatar Feb 06 '24 11:02 fusaroli

@fusaroli Could brms build on the approach in-progress for cmdstanr? https://github.com/stan-dev/cmdstanr/issues/876 I suspect brmsfit objects would need a method that wraps the cmdstanr implementation. If that intuition is correct, it might be best to open a separate issue given that this functionality would have prerequisite features both here and in cmdstanr.

wpetry avatar Feb 06 '24 14:02 wpetry

Side comment: A use case we have explored is to use pathfinder to identify initial values, so an easy way to feed pathfinder outcomes into the initial values for a new run of brms based on HMC would be ideal

For Pathfinder I think this is THE use case. Reduce sum has been a godsend, but unless you have a cluster to work with, your 4-8 CPUs are best used on only one chain because having multiple chains go through long individual warmups is such a huge suck of time by comparison. If Pathfinder inits could slice the warmup time to, say, 100-250 iterations per chain, with the initial adaptation being pro forma for most models, it could become efficient to run 3 or 4 chains on a personal machine with an extra thread or two per chain, which would make results more robust, much faster. Would be a huge benefit on an already terrific setup, particularly during exploratory work.

bachlaw avatar Mar 02 '24 15:03 bachlaw

pathfinder and laplace are now supported with the cmdstanr backend

paul-buerkner avatar Mar 18 '24 09:03 paul-buerkner

awesome, thanks!

fusaroli avatar Mar 18 '24 09:03 fusaroli

Versions: R (4.4.0), brms (2.21.0), cmdstandr (0.7.1), Rtools43, Windows 11 Pro

I'm not sure if this is an issue that only I am having or if others are as well, but I get an error thrown if I fit any of the variational algorithms. For example,

 brm(
  count ~ zBase * Trt + (1|patient),
  data = epilepsy, family = poisson(),
  prior = prior(normal(0, 10), class = b) +
    prior(cauchy(0, 2), class = sd), 
  backend = "cmdstanr", 
  algorithm = "laplace", 
  draws = 1
)

results in the following error thrown:

Error in `[.data.frame`(diagnostics[[i]], rstan_diagn_order) : 
  undefined columns selected

I showed the code setting algorithm = laplace , but the exception is also thrown if I set to algorithm = pathfinder or algorithm = meanfield, as well.

If I use algorithm = sampling, no error is thrown.

All 4 chains finished successfully.
Mean chain execution time: 3.2 seconds.
Total execution time: 13.4 seconds.

 Family: poisson 
  Links: mu = log 
Formula: count ~ zBase * Trt + (1 | patient) 
   Data: epilepsy (Number of observations: 236) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~patient (Number of levels: 59) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.59      0.07     0.47     0.74 1.00      872     1709

Regression Coefficients:
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept      1.78      0.12     1.55     2.01 1.00      670     1381
zBase          0.70      0.12     0.46     0.94 1.01      740     1173
Trt1          -0.28      0.16    -0.58     0.03 1.00      732     1410
zBase:Trt1     0.03      0.16    -0.29     0.34 1.00      855     1411

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

marcus-waldman avatar Apr 27 '24 21:04 marcus-waldman