cmdstan
cmdstan copied to clipboard
Add `log_prob_grad` method whose interface mimicks `generate_quantities`
Summary:
Add a method to cmdstan which reads in a csv file and (re)computes lp__
and its gradient.
I want to add a feature and (I think) I'm following this: https://github.com/stan-dev/stan/wiki/Developer-process-overview
If not, please tell.
Description:
Currently there exists no good method (using cmdstan(py)) that takes a set of parameter values and (re)computes lp__
and its gradient. See https://discourse.mc-stan.org/t/log-prob-grad-via-csv-files-for-cmdstan/22380
This would be good to have to calibrate the ODE solver configurations.
Changes required are:
- Add
stan/services/sample/standalone_lpg.hpp
- Modify
cmdstan/command.hpp
Eventually this method should be exposed via CmdStanPy or its R equivalent.
Current Version:
v2.26.1
technically, for things in stan/services
, you should file an issue on repo stan
.
and this sounds like a feature where a design-doc proposal might be required.
how is this proposal different from the CmdStan diagnose
method? which needs to be wrapped by CmdStanPy.
see: https://github.com/stan-dev/cmdstanpy/issues/233 and https://mc-stan.org/docs/2_26/cmdstan-guide/diagnosing-hmc-by-comparison-of-gradients.html
technically, for things in stan/services, you should file an issue on repo stan. and this sounds like a feature where a design-doc proposal might be required.
Yes, it affects both stan and cmdstan. I was on my way to also opening an issue on stan, when I had to leave.
how is this proposal different from the CmdStan diagnose method?
(I think) the diagnose
method compares the computed / approximated gradient of only the initial points with a finite difference approximation.
The proposed method computes lp__
and its gradient for all samples from some previous fit. If gq gets extended to support "arbitrary" csv files we can simultaneously extend this method to do so, the interface and implementation would be exactly the same.
PyStan 2/3 has something similar which apparantly gets used (see https://discourse.mc-stan.org/t/speed-of-evaluating-gradients-of-log-probabilities-in-pystan-2-x-vs-3/22303/4) and there appears to be interest for it via CmdStanR/Py as well (see https://discourse.mc-stan.org/t/potential-for-log-prob-grad-log-prob-in-cmdstanr/11700).
More importantly, I don't think we know a priori what good choices are for the step size in the diagnose
method. This method would enable us to tune the ODE solver configurations across the expected parameter range (e.g. across the prior or across intermediate posteriors) independently of some parameter of which we do not know the ideal value and whose ideal value will be problem dependent.
I don't know ODE solver implementation details, but there may also be performance benefits when allocated memory can be shared across draws.
(I think) the diagnose method compares the computed / approximated gradient of only the initial points with a finite difference approximation.
Correct.
I think that this feature is a good idea. As @mitzimorris points out, a design doc is probably a good idea in this case. It‘s work, I know, but that way people with a broader view on the infrastructure get a heads up early on this.
Thanks. The design doc and tests should be most of the work, I already have an implementation that exactly mirrors the gq-implementation. Right now I'm trying to figure out how to fork everything that's needed.
… and the doc …
Right, I was hoping to copy most of the documentation from generate_quantities
, as the interface would be identical :)
Hm, alright I finished the implementation and added some tests.
What gets tested so far:
Interface:
- For the return code test this https://github.com/funko-unko/cmdstan/blob/feature/issue-1012-add-log_prob_grad/src/test/interface/log_prob_grad_test.cpp mimics and slightly extends/fixes the original (unchanged) https://github.com/funko-unko/cmdstan/blob/feature/issue-1012-add-log_prob_grad/src/test/interface/generated_quantities_test.cpp
- For the argument configuration test I again just mimic gq, which gets excluded for reasons I do not know https://github.com/funko-unko/cmdstan/blob/feature/issue-1012-add-log_prob_grad/src/test/interface/arguments/argument_configuration_test.cpp
Correctness:
- So far the correctness of the output only gets tested for the Bernoulli example. This https://github.com/funko-unko/stan/blob/feature/issue-1012-add-log_prob_grad/src/test/unit/services/sample/standalone_lpg_test.cpp mimics https://github.com/funko-unko/stan/blob/feature/issue-1012-add-log_prob_grad/src/test/unit/services/sample/standalone_gqs_test.cpp and checks that the recomputed
lp__
agrees with the one from the sample-csv and from the diagnostic-csv and that the recomputed gradient agrees (with a negative sign) with the (scalar) gradient from the diagnostic-csv
@mitzimorris @wds15
Do you think we could squeeze it into the next stan release? D:
no this can't go into the next release.
this requires a design doc before an implementation. there are a lot of questions that need to be answered about how performant this is going to be.
this has happened before - someone submits an issue, more often a PR, for some code that they've put a lot of work into and then hard feelings ensue when it gets rejected. I would hate to see this happen again, but I'm afraid that you're going to have to run the design_docs gauntlet before this can move forward.
there are a lot of questions that need to be answered about how performant this is going to be.
Hm, performance wise it appears to introduce much less overhead than the current pystan implementation, see the above linked thread
A direct comparison reveals that for a simple 1-parameter model my cmdstan implementation takes less than .02 seconds for 1000 evaluations while the current pystan implementation takes roughly 2 seconds. Pystan's overhead of course becomes less significant if the gradient evaluation takes longer.
But yes, of course this should not be rushed.
so glad you understand. I would say that at this point you've implemented the proof-of-concept and this is a good starting point for design. I hope that doesn't sound condescending - it's not meant that way.
Hm, so how should I proceed?
The design doc Readme suggests opening a discourse thread, I guess my task now is to chill?
Edit: CmdStan has so few top level methods, I guess adding another one "forever" shouldn't be done lightly.
you can also create a design doc, and then submit a PR - discussion proceeds in the PR comments.
https://github.com/stan-dev/design-docs/pulls
Hm, so how should I proceed?
Write the design out. You can try revising this issue if it's all going to be in one repo to start. The more onerous and more certain path is to create a pull request in the design-docs repo with the design. I'm happy to review it.
I use RStan's access to log density and its gradient in algorithm development. There, I can't send a batch of parameters to evaluate, I need to do it one at a time. I don't understand how that could be made performant in CmdStan without some kind of server architecture that loads the data only once. So I'd like to understand the use case you have in mind.
I guess adding another one "forever" shouldn't be done lightly.
It's not just that CmdStan doesn't have many top-level calls. Maintenance cost is quadratic, as I tried to explain in a blog post. Also, adding more things makes all the doc more cumbersome unless it's very cleverly factored, which I'm afraid ours is not.
Also, the gradients I get back are on the unconstrained parameters. There's no way in Stan to get the gradients with respect to the constrained parameters. The design doc needs to explain how that is done with and without the Jacobian adjustment With is for Bayes (MCMC, MAP, VI) and without is for penalized MLE. They produce different gradients when there are constrained parameters.
Edit: CmdStan has so few top level methods, I guess adding another one "forever" shouldn't be done lightly.
early on the interfaces were siloed because of lack of consensus, and so PyStan and RStan hacked in things that they wanted to do directly, and those implementations are not necessarily robust or tested and definitely not performant.
the CmdStan interface right now is a minimal wrapper around the services layer which instantiates a model and then calls exactly one services method and exits. the kinds of diagnostics you're looking for would require a more stateful interface where it's possible to query the state of the sampler and other inference engines.
I started a design doc for CmdStan3 but am not sure my proposal is what's needed.
edit: just saw Bob's comment regarding need for something that's cleverly factored - that's more or less what I'm trying to say here too.
Thanks @bob-carpenter and @mitzimorris for the time and feedback.
you can also create a design doc, and then submit a PR - discussion proceeds in the PR comments.
Write the design out.
I'll do this next then :)
There, I can't send a batch of parameters to evaluate, I need to do it one at a time.
Right, I did not mentions it, the above performence benefit arises with batches, which is "very" expensive in pystan. I actually have not tried whether doing it one at a time is also fast. Let me check.
So I'd like to understand the use case you have in mind.
I think the use case are mainly ODE models. Here you want to make sure that you solve the ODE with sufficient precision such that your sampler is not negatively impacted, but you also don't want to waste CPU cycles in solving it too accurately. Being able to first draw samples from the prior or from intermediate posteriors (conditioned only on part of the data) and then recomputing lp__
and its gradient with varying tolerances until some convergence criterion is satisfied allows you to tune your solver configurations to be just tight enough.
One negative example that did not do this is the planetary motion test case with the specified tolerances, you simply cannot (max_num_steps exhausted) or only inaccurately solve your ode for many parameters from your a-priori compatible parameter space. See https://discourse.mc-stan.org/t/log-prob-grad-via-csv-files-for-cmdstan/22380/7
Tuning the solver configuration and ensuring convergence / sufficient precision should become even more important for the adjoint ode solver, where we have more knobs to turn and even less a priori knowledge about sensible configurations.
The gradients will be / currently are exactly the one that get written to the diagnostic_file during hmc. For this use case any one of the available gradients would be fine.
I'll come back at you later.
@bob-carpenter @mitzimorris Just a short heads up:
- Calling
log_prob_grad
draw by draw (as pystan currently has to do) is slower using my current cmdstan implementation than using pystan, i.e. the overhead introduced somewhere (i/o?) appears to be larger per call. Curiously, this does not appear to depend at all on the size of the data or number of parameters. - I'll have to put this on the back burner, but I'll write the design doc. With more time to spare, we may think about providing a service function that can do whatever the
generate_quantities
or my potentiallog_prob_grad
method can and then some. GQ and LPG would then just be special calls to the more general method.
https://github.com/stan-dev/cmdstan/pull/1107