stan experimental service function for pathfinder & multi-path pathfinder

Summary:

Expose the implementation of Pathfinder and multi-path Pathfinder through service functions.

Description:

Although they are very close, I decided that two service functions would be simpler than one with a few toggles. The service functions will go in the files

src/stan/services/experimental/pathfinder/pathfinder.hpp.
src/stan/services/experimental/pathfinder/multi_pathfinder.hpp.

The signature of the functions will be as follows (the comments after variable names are the names of the variables in the Pathfinder arXiv paper).

Single-path Pathfinder

/**
 * Run the single-path Pathfinder algorithm for the specified model,
 * using the specified random seed and chain ID, using the specified
 * fixed and random initialization, given the L-BFGS parameters, number
 * of ELBO samples to evaluate, and final sample size to return,
 * writing results to the specified callbacks.
 *
 * From the specified initialization, which may be part or wholly
 * random, Pathfinder runs the limited memory BFGS (L-BFGS) algorithm
 * until the objective (log density) has converged within a specified
 * relative tolerance, or the maximum number of iterations is hit.
 * Then for each point on the optimization trajectory, a multivariate
 * normal with covariance given approximately by L-BFGS, the evidence
 * lower bound (ELBO) is evaluated using a specified number of Monte
 * Carlo draws.  Finally, draws are taken from the normal
 * approxiamtion with highest ELBO and transformed back to the
 * constrained scale to be returned as a sample.
 *
 * Algorithm description and evaluation:
 * Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari (2022)
 * Pathfinder: parallel quasi-Newton variational inference.  arXiv.
 * https://arxiv.org/pdf/2108.03782.pdf
 * 
 * @tparam M type of model class
 * @param model Stan model
 * @param random_seed seed for PRNG
 * @param chain_id chain identifier for multiple chains
 * @param init fixed initial values
 * @param init_radius uniform random initialization bounds 
 * @param lbfgs_max_history max history size, determining rank of
 * covariance estimate
 * @param lbfgs_max_iterations maximum iterations for L-BFGS
 * @param lbfgs_rel_tolerarnce relative tolerance on L-BFGS
 * convergence of the log density
 * @param elbo_samples number of Monte Carlo samples used to estimate
 * ELBO 
 * @param sample_size number of approximate draws to return
 * @param interrupt callback for interrupting process
 * @param logger callback for writing log messages to console
 * @param init_writer callback for writing initialization
 * @param parameter_writer callback for writing sample draws
 * @param diagnostic_writer callback for writing diagnostics
 */
template <typename M>
int pathfinder(Model& model,
	       unsigned int random_seed,
	       unsigned int chain_id,
	       const stan::io::var_context& init,         // pi_0
	       double init_radius,                        // pi_0
	       double lbfgs_max_history,                  // J
	       unsigned int lbfgs_max_iterations,         // L
	       double lbfgs_rel_tolerance,                // tau_rel
	       unsigned int elbo_samples,                 // K
	       int sample_size,                           // M

	       callbacks::interrupt& interrupt,
	       callbacks::logger& logger,
	       callbacks::writer& init_writer,
	       callbacks::writer& parameter_writer,
	       callbacks::writer& diagnostic_writer);

Multi-path Pathfinder

/**
 * Run the multi-path Pathfinder algorithm for the specified model,
 * using the specified random seed and chain ID, using the specified
 * fixed and random initialization, given the LBFGS parameters, number
 * of ELBO samples to evaluate, and final sample size to return,
 * writing results to the specified callbacks.
 *
 * Multi-path Pathfinder runs a specified number of single-path
 * Pathfinder instances, collecting a specified number of draws per
 * run, then importance resamples a final sample of a specified size.
 * Importance resampling is done with Pareto smoothing to reduce
 * variance. 
 * 
 * Algorithm description and evaluation:
 * Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari (2022)
 * Pathfinder: parallel quasi-Newton variational inference.  arXiv.
 * https://arxiv.org/pdf/2108.03782.pdf
 * 
 * @tparam M type of model class
 * @param model Stan model
 * @param random_seed seed for PRNG
 * @param chain_id chain identifier for multiple chains
 * @param init fixed initial values
 * @param init_radius uniform random initialization bounds 
 * @param lbfgs_max_history max history size, determining rank of
 * covariance estimate
 * @param lbfgs_max_iterations maximum iterations for L-BFGS
 * @param lbfgs_rel_tolerarnce relative tolerance on L-BFGS
 * convergence of the log density
 * @param elbo_samples number of Monte Carlo samples used to estimate
 * ELBO 
 * @param sample_size number of approximate draws to return
 * @param interrupt callback for interrupting process
 * @param logger callback for writing log messages to console
 * @param init_writer callback for writing initialization
 * @param parameter_writer callback for writing sample draws
 * @param diagnostic_writer callback for writing diagnostics
 */
template <typename M>
int multi_pathfinder(Model& model,                  
		     unsigned int random_seed,
		     unsigned int chain,
		     const stan::io::var_context& init,     // pi_0
		     double init_radius,                    // pi_0
		     double lbfgs_max_history,              // J
		     unsigned int lbfgs_max_iterations,     // L
		     double lbfgs_rel_tolerance,            // tau_rel
		     unsigned int elbo_samples,             // K
		     int single_path_runs,                  // I
		     int single_path_sample_size,           // M
		     int sample_size,                       // R

		     callbacks::interrupt& interrupt,
		     callbacks::logger& logger,
		     callbacks::writer& init_writer,
		     callbacks::writer& parameter_writer,
		     callbacks::writer& diagnostic_writer);

Expected Output:

Draws converted back to constrained scale like ADVI sent to parameter_writer, along with whatever diagnostics are available sent to diagnostic_writer (such as ELBO evaluation statistics).

Additional Information:

This is all going in the experimental folder and will be marked as such like ADVI when run.

Current Version:

v2.29.2

Apr 11 '22 21:04 bob-carpenter

What is the rationale for bypassing the current procedures for including a new algorithm, regardless of whether it is accompanied by non-standardized "experimental folder and will be marked as such like ADVI when run”? There have been no RFCs for this method nor for any procedure for incorporating official experimental methods instead of implementing them in forks as it typical for open source projects. At the same time I haven’t seen any discussion from the governing body about formalizing any such procedure.

On Apr 11, 2022, at 5:09 PM, Bob Carpenter @.***> wrote:

Summary:

Expose the implementation of Pathfinder and multi-path Pathfinder through service functions.

Description:

Although they are very close, I decided that two service functions would be simpler than one with a few toggles. The service functions will go in the files

src/stan/services/experimental/pathfinder/pathfinder.hpp. src/stan/services/experimental/pathfinder/multi_pathfinder.hpp. The signature of the functions will be as follows (the comments after variable names are the names of the variables in the Pathfinder arXiv paper).

Single-path Pathfinder

/**
Run the single-path Pathfinder algorithm for the specified model,

using the specified random seed and chain ID, using the specified

fixed and random initialization, given the L-BFGS parameters, number

of ELBO samples to evaluate, and final sample size to return,

writing results to the specified callbacks.

From the specified initialization, which may be part or wholly

random, Pathfinder runs the limited memory BFGS (L-BFGS) algorithm

until the objective (log density) has converged within a specified

relative tolerance, or the maximum number of iterations is hit.

Then for each point on the optimization trajectory, a multivariate

normal with covariance given approximately by L-BFGS, the evidence

lower bound (ELBO) is evaluated using a specified number of Monte

Carlo draws. Finally, draws are taken from the normal

approxiamtion with highest ELBO and transformed back to the

constrained scale to be returned as a sample.

Algorithm description and evaluation:

Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari (2022)

Pathfinder: parallel quasi-Newton variational inference. arXiv.

https://arxiv.org/pdf/2108.03782.pdf

@tparam M type of model class

@param model Stan model

@param random_seed seed for PRNG

@param chain_id chain identifier for multiple chains

@param init fixed initial values

@param init_radius uniform random initialization bounds

@param lbfgs_max_history max history size, determining rank of

covariance estimate

@param lbfgs_max_iterations maximum iterations for L-BFGS

@param lbfgs_rel_tolerarnce relative tolerance on L-BFGS

convergence of the log density

@param elbo_samples number of Monte Carlo samples used to estimate

ELBO

@param sample_size number of approximate draws to return

@param interrupt callback for interrupting process

@param logger callback for writing log messages to console

@param init_writer callback for writing initialization

@param parameter_writer callback for writing sample draws
@param diagnostic_writer callback for writing diagnostics */ template <typename M> int pathfinder(Model& model, unsigned int random_seed, unsigned int chain_id, const stan::io::var_context& init, // pi_0 double init_radius, // pi_0 double lbfgs_max_history, // J unsigned int lbfgs_max_iterations, // L double lbfgs_rel_tolerance, // tau_rel unsigned int elbo_samples, // K int sample_size, // M
  callbacks::interrupt& interrupt,
  callbacks::logger& logger,
  callbacks::writer& init_writer,
  callbacks::writer& parameter_writer,
  callbacks::writer& diagnostic_writer);
Multi-path Pathfinder

/**
Run the multi-path Pathfinder algorithm for the specified model,

using the specified random seed and chain ID, using the specified

fixed and random initialization, given the LBFGS parameters, number

of ELBO samples to evaluate, and final sample size to return,

writing results to the specified callbacks.

Multi-path Pathfinder runs a specified number of single-path

Pathfinder instances, collecting a specified number of draws per

run, then importance resamples a final sample of a specified size.

Importance resampling is done with Pareto smoothing to reduce

variance.

Algorithm description and evaluation:

Lu Zhang, Bob Carpenter, Andrew Gelman, Aki Vehtari (2022)

Pathfinder: parallel quasi-Newton variational inference. arXiv.

https://arxiv.org/pdf/2108.03782.pdf

@tparam M type of model class

@param model Stan model

@param random_seed seed for PRNG

@param chain_id chain identifier for multiple chains

@param init fixed initial values

@param init_radius uniform random initialization bounds

@param lbfgs_max_history max history size, determining rank of

covariance estimate

@param lbfgs_max_iterations maximum iterations for L-BFGS

@param lbfgs_rel_tolerarnce relative tolerance on L-BFGS

convergence of the log density

@param elbo_samples number of Monte Carlo samples used to estimate

ELBO

@param sample_size number of approximate draws to return

@param interrupt callback for interrupting process

@param logger callback for writing log messages to console

@param init_writer callback for writing initialization

@param parameter_writer callback for writing sample draws
@param diagnostic_writer callback for writing diagnostics */ template <typename M> int multi_pathfinder(Model& model,
unsigned int random_seed, unsigned int chain, const stan::io::var_context& init, // pi_0 double init_radius, // pi_0 double lbfgs_max_history, // J unsigned int lbfgs_max_iterations, // L double lbfgs_rel_tolerance, // tau_rel unsigned int elbo_samples, // K int single_path_runs, // I int single_path_sample_size, // M int sample_size, // R
    callbacks::interrupt& interrupt,
    callbacks::logger& logger,
    callbacks::writer& init_writer,
    callbacks::writer& parameter_writer,
    callbacks::writer& diagnostic_writer);
Expected Output:

Draws converted back to constrained scale like ADVI sent to parameter_writer, along with whatever diagnostics are available sent to diagnostic_writer (such as ELBO evaluation statistics).

Additional Information:

This is all going in the experimental folder and will be marked as such like ADVI when run.

Current Version:

v2.29.2

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/3119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALU3FS2X2YG3MJZVSAL4ALVESIKBANCNFSM5TENVQWA. You are receiving this because you are subscribed to this thread.

May 05 '22 04:05 betanalpha

@betanalpha: What do you consider to be the current procedure for including a new algorithm? What about changes to existing algorithms? Is the process written up somewhere?

May 10 '22 15:05 bob-carpenter

There was a detailed procedure on the github wiki, although the wiki no longer seems to be around. I missed when that happened.

That said the procedure was set up long ago before a lot of the technical governance fell apart. At this point the only procedure for substantial changes has been the design docs, and even then there’s no formal criteria for acceptance.

Jun 01 '22 00:06 betanalpha

@betanalpha if I am not mistaken, you are looking for https://github.com/stan-dev/stan/wiki/Proposing-Algorithms-for-Inclusion-Into-Stan? This page is still there at https://github.com/stan-dev/stan/wiki under Contributing new functions and algorithms. The wiki was reorganized a bit I think a year ago - it was full of outdated clutter.

Jun 01 '22 08:06 rok-cesnovar

Yes, thanks Rok! My apologies for forgetting that the wikis are local to each repo and not the entire stan-dev account.

This is incidental but, given the wealth of information in some of the wikis, it might be useful to link to the main repo wikis on the stan-dev landing page.

The criteria listed in the link are very liberal, and they were largely constructed to formally quantify the many problems with the current ADVI implementation in Stan. Criteria that ensure reasonable robustness relative to the Hamiltonian Monte Carlo implementation would be much stricter, but we had never had to consider those stricter criteria to that point. Ultimately the goal was to provide a uniform path for anyone in the community to propose the introduction of a new method while maintaining the robustness of the user experience.

On Jun 1, 2022, at 4:14 AM, Rok Češnovar @.***> wrote:

@betanalpha https://github.com/betanalpha if I am not mistaken, you are looking for https://github.com/stan-dev/stan/wiki/Proposing-Algorithms-for-Inclusion-Into-Stan https://github.com/stan-dev/stan/wiki/Proposing-Algorithms-for-Inclusion-Into-Stan? This page is still there at https://github.com/stan-dev/stan/wiki https://github.com/stan-dev/stan/wiki under Contributing new functions and algorithms. The wiki was reorganized a bit I think a year ago - it was full of outdated clutter.

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/3119#issuecomment-1143265423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALU3FRC6MBBBITZT7JOSQTVM4LVBANCNFSM5TENVQWA. You are receiving this because you were mentioned.

Jun 20 '22 14:06 betanalpha

I think the current plan is to decommission the wikis and move that information to GitHub Pages. The Wiki is too easy to hack even if we turn off anyone-can-edit features.

Jun 21 '22 16:06 bob-carpenter

stan stan copied to clipboard

experimental service function for pathfinder & multi-path pathfinder

Summary:

Description:

Single-path Pathfinder

Multi-path Pathfinder

Expected Output:

Additional Information:

Current Version:

stan
stan copied to clipboard