cmdstanr
cmdstanr copied to clipboard
Add rstan::extract()-like feature to fit$draws()
Consider the following model:
data{
int N;
vector[N] obs_times;
vector[N] t;
}
parameters{
real<lower=0> lambda;
}
model{
obs_times~exponential(lambda);
lambda~normal(0,5);
}
generated quantities{
//Draw survival function
vector[N] surv;
for(i in 1:N){
surv[i]=exp(-lambda*t[i]);
}
}
library(cmdstanr)
dat<-list(N=100,obs_times=data$x,t=seq(0,1.5,length=100))
mod<-cmdstan_model("surv_exp.stan")
fit<-mod$sample(data=dat,
chains=4,
parallel_chains=4)
If I convert the "cmdstanr" object to an "rstan" object and use extract() I can access the surv[i] objects in the following order.
library(rstan)
stanfit <- rstan::read_stan_csv(fit$output_files())
post<-extract(stanfit)
post$surv[1:2,]
iterations [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0.960775 0.923088 0.886879 0.852091 0.818668 0.786555 0.755702 0.726060
[2,] 1 0.951013 0.904426 0.860121 0.817987 0.777916 0.739809 0.703568 0.669102
iterations [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,] 0.697580 0.670217 0.643927 0.618669 0.594402 0.571086 0.548685 0.527163 0.506484
[2,] 0.636325 0.605154 0.575509 0.547317 0.520505 0.495008 0.470759 0.447698 0.425767
This does not seem to be possible using fit$draws(). The only solutions require using "posterior" and/or "tidybayes" (https://discourse.mc-stan.org/t/how-can-i-change-the-way-estimates-are-printed-by-fit-draws/26486)
> fit$draws("surv",format="list")
# A draws_list: 1000 iterations, 4 chains, and 100 variables
[chain = 1]
$`surv[1]`
[1] 1 1 1 1 1 1 1 1 1 1
$`surv[2]`
[1] 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96
$`surv[3]`
[1] 0.92 0.92 0.93 0.93 0.91 0.92 0.92 0.92 0.92 0.92
$`surv[4]`
[1] 0.88 0.89 0.89 0.89 0.87 0.88 0.88 0.88 0.88 0.88
I think it would be useful to add this extract()-like feature to fit$draws().
Hi, thanks for the feature request.
You can do the following:
library(cmdstanr)
fit <- cmdstanr_example()
# currently does not work
# fit$draws(format = "draws_rvars")
a <- posterior::as_draws_rvars(fit$draws())
posterior::draws_of(a$beta)[1:2,]
and get
> posterior::draws_of(a$beta)[1:2,]
[,1] [,2] [,3]
1 -0.602041 -0.567582 1.345550
2 -0.965585 -0.142869 0.983629
which is what you want. Once we fix https://github.com/stan-dev/cmdstanr/issues/582 you will be to just use fit$draws(format = "draws_rvars")
directly.
The output won't match rstan::extract
as extract permutes the samples. If you sort the columns you will see that the results match.
I don't think we want to duplicate the code and add this to cmdstanr as well, as posterior is a direct dependency of cmdstanr anyways and you can just use that. One of the goals of the posterior package was exactly this, that we don't have to have this logic in all interfaces, and rstan will eventually most likely follow the same path.
We do have to do a better job of documenting this, that is for sure.
I don't think we want to duplicate the code and add this to cmdstanr as well, as posterior is a direct dependency of cmdstanr anyways and you can just use that. One of the goals of the posterior package was exactly this, that we don't have to have this logic in all interfaces, and rstan will eventually most likely follow the same path.
it would be a tremendous convenience to the user to add this to CmdStanR because this is what users expect. the documentation for posterior is cryptic, at best. if you're not going to add this to CmdStanR, then you shoud add a whole bunch of examples to the documentation of how to get the posterior sample in the form of a structured variable, additionally, explanation of what "rvars" is and how it works.
We certainly can add this to CmdStanR if that is what we decide. This was just my opinion and what I think was the general idea I got from Jonah et al.
This is just my personal opinion that there is much value in adding functions that are just 1-to-1 wrappers to much more general and powerful functions. I am far from a statistical practitioner, so not sure my view on this is that of our user, it is probably more likely it is not. I understand that rstan users expect this functionality though.
My view is that just wrapping a function to another package leads to a requirement of duplicating docs, which leads to issues like https://github.com/stan-dev/cmdstanr/issues/617 Posterior is a much more powerful package and much more general and its better to point the users to use its functionalities and point to any lack of doc there, if there is.
A function fit$extract(name = "var")
would just be a wrapper to posterior::as_draws_rvars(fit$draws("var"))
and it would just limit the functionalities of the functions it would wrap.
In my personal experience, users don't fully understand everything the fit$summarise_draws()
function offers, but once I point them to https://mc-stan.org/posterior/reference/draws_summary.html things start to make more sense. To be fair, the N for that is only 3.
I think we just need to have more examples with more postprocessing examples.
I think we just need to have more examples with more postprocessing examples.
this is key. I spent winter break translating a Stan case study (partial pooling binary trials), written in RStan into CmdStanR and CmdStanPy. it took me quite a while to figure out how to get CmdStanR stuff working properly - WIP is here: https://github.com/stan-dev/example-models/blob/update-hierarchical-pooling/knitr/pool-binary-trials/pool-no-pool.Rmd
Very nice, thanks for working on this! Yeah, we need stuff like this.
Will let @jgabry chime in before closing this.
Yeah I think we just need more examples/doc on on postprocessing. The functionality that people want is there already (I don't think we need to duplicate posterior's functionality in cmdstanr or add more wrappers) but we definitely don't yet have enough easy-to-follow examples demonstrating how use everything.
WIP is here: https://github.com/stan-dev/example-models/blob/update-hierarchical-pooling/knitr/pool-binary-trials/pool-no-pool.Rmd
@mitzimorris This is great! We definitely need more like this.
From an end user's perspective I think this is fine. I would expect to find an exemple here: https://mc-stan.org/cmdstanr/articles/cmdstanr.html
a <- posterior::as_draws_rvars(fit$draws())
posterior::draws_of(a$surv)[1,]
Thank @fsdias! Will keep this issue open until we add at least a short example like you mention.
As an end user, it would be very happy for fit$draws(format = "draws_rvars")
to return the following object:
draws_list <- posterior::as_draws_rvars(fit$draws())
var_names <- names(draws_list)
out <- lapply(var_names, function(var_name) {
posterior::draws_of(draws_list[[var_name]])
})
names(out) <- var_names
return(out)
I don't think it is necessary to store this result in fit
.
to return the following object:
This wish looks like an issue for the posterior package proposing a new helper function
Is there a way currently to get a structured output in Stan that's not a vector? For example, if I have a parameter declared as
matrix[3, 2] A;
Is there a way for me to fit N
in K
chains and draws and get a 3 x 2 x (N*K)
matrix out (my understanding is that would be the efficient way to access given R's last column major scheme).
For me, literally all I want out of the Stan and R side is to
- compile a model
- load data into a model
- sample from model + data
- structured extract like RStan's
- summary
I think the right way to architect these packages from a software modularity perspective is to not have cmdstanr
or posterior
depend on each other, but to have cmdstanr
produce an output format that posterior
can ingest. Then all this doc would go on the posterior
side. But I realize this isn't how R packages typically work---they typically import the world and expose it all in the top-level namespace.
Going to close this old issue since it's similar to #183 and there is a solution available via the posterior package.
Suppose x
is a 2x3 parameter matrix:
draws <- posterior::as_draws_rvars(fit$draws())
x_draws <- posterior::draws_of(draws$x) # same shape as rstan::extract(fit, pars = "x")$x
I will add something to the doc to make this more clear.
I think the right way to architect these packages from a software modularity perspective is to not have
cmdstanr
orposterior
depend on each other, but to havecmdstanr
produce an output format thatposterior
can ingest.
posterior does not depend on cmdstanr. but cmdstanr does use posterior internally for various things like reformatting and subsetting posterior draws. If we didn't do that we would need to duplicate that code in cmdstanr, which wouldn't be ideal from a maintenance perspective.
This will now be better documented in the CmdStanR vignettes: https://github.com/stan-dev/cmdstanr/pull/955/files