cmdstanr icon indicating copy to clipboard operation
cmdstanr copied to clipboard

Add rstan::extract()-like feature to fit$draws()

Open fsdias opened this issue 3 years ago • 11 comments

Consider the following model:

data{
int N;
vector[N] obs_times;
vector[N] t;
}
parameters{
real<lower=0> lambda;
}
model{
obs_times~exponential(lambda);
lambda~normal(0,5);
}
generated quantities{
  //Draw survival function
  vector[N] surv;
  for(i in 1:N){
    surv[i]=exp(-lambda*t[i]);
  }
}
library(cmdstanr)
dat<-list(N=100,obs_times=data$x,t=seq(0,1.5,length=100))
mod<-cmdstan_model("surv_exp.stan")
fit<-mod$sample(data=dat,
                chains=4,
                parallel_chains=4)

If I convert the "cmdstanr" object to an "rstan" object and use extract() I can access the surv[i] objects in the following order.

library(rstan)
stanfit <- rstan::read_stan_csv(fit$output_files())
post<-extract(stanfit)

post$surv[1:2,]
          
iterations [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]
      [1,]    1 0.960775 0.923088 0.886879 0.852091 0.818668 0.786555 0.755702 0.726060
      [2,]    1 0.951013 0.904426 0.860121 0.817987 0.777916 0.739809 0.703568 0.669102
          
iterations    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]    [,16]    [,17]    [,18]
      [1,] 0.697580 0.670217 0.643927 0.618669 0.594402 0.571086 0.548685 0.527163 0.506484
      [2,] 0.636325 0.605154 0.575509 0.547317 0.520505 0.495008 0.470759 0.447698 0.425767

This does not seem to be possible using fit$draws(). The only solutions require using "posterior" and/or "tidybayes" (https://discourse.mc-stan.org/t/how-can-i-change-the-way-estimates-are-printed-by-fit-draws/26486)

> fit$draws("surv",format="list")
# A draws_list: 1000 iterations, 4 chains, and 100 variables

[chain = 1]
$`surv[1]`
 [1] 1 1 1 1 1 1 1 1 1 1

$`surv[2]`
 [1] 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96

$`surv[3]`
 [1] 0.92 0.92 0.93 0.93 0.91 0.92 0.92 0.92 0.92 0.92

$`surv[4]`
 [1] 0.88 0.89 0.89 0.89 0.87 0.88 0.88 0.88 0.88 0.88

I think it would be useful to add this extract()-like feature to fit$draws().

fsdias avatar Feb 22 '22 11:02 fsdias

Hi, thanks for the feature request.

You can do the following:

library(cmdstanr)

fit <- cmdstanr_example()

# currently does not work
# fit$draws(format = "draws_rvars") 

a <- posterior::as_draws_rvars(fit$draws())

posterior::draws_of(a$beta)[1:2,]

and get

> posterior::draws_of(a$beta)[1:2,]
       [,1]      [,2]     [,3]
1 -0.602041 -0.567582 1.345550
2 -0.965585 -0.142869 0.983629

which is what you want. Once we fix https://github.com/stan-dev/cmdstanr/issues/582 you will be to just use fit$draws(format = "draws_rvars") directly.

The output won't match rstan::extract as extract permutes the samples. If you sort the columns you will see that the results match.

I don't think we want to duplicate the code and add this to cmdstanr as well, as posterior is a direct dependency of cmdstanr anyways and you can just use that. One of the goals of the posterior package was exactly this, that we don't have to have this logic in all interfaces, and rstan will eventually most likely follow the same path.

We do have to do a better job of documenting this, that is for sure.

rok-cesnovar avatar Feb 22 '22 11:02 rok-cesnovar

I don't think we want to duplicate the code and add this to cmdstanr as well, as posterior is a direct dependency of cmdstanr anyways and you can just use that. One of the goals of the posterior package was exactly this, that we don't have to have this logic in all interfaces, and rstan will eventually most likely follow the same path.

it would be a tremendous convenience to the user to add this to CmdStanR because this is what users expect. the documentation for posterior is cryptic, at best. if you're not going to add this to CmdStanR, then you shoud add a whole bunch of examples to the documentation of how to get the posterior sample in the form of a structured variable, additionally, explanation of what "rvars" is and how it works.

mitzimorris avatar Feb 22 '22 16:02 mitzimorris

We certainly can add this to CmdStanR if that is what we decide. This was just my opinion and what I think was the general idea I got from Jonah et al.

This is just my personal opinion that there is much value in adding functions that are just 1-to-1 wrappers to much more general and powerful functions. I am far from a statistical practitioner, so not sure my view on this is that of our user, it is probably more likely it is not. I understand that rstan users expect this functionality though.

My view is that just wrapping a function to another package leads to a requirement of duplicating docs, which leads to issues like https://github.com/stan-dev/cmdstanr/issues/617 Posterior is a much more powerful package and much more general and its better to point the users to use its functionalities and point to any lack of doc there, if there is.

A function fit$extract(name = "var") would just be a wrapper to posterior::as_draws_rvars(fit$draws("var")) and it would just limit the functionalities of the functions it would wrap.

In my personal experience, users don't fully understand everything the fit$summarise_draws() function offers, but once I point them to https://mc-stan.org/posterior/reference/draws_summary.html things start to make more sense. To be fair, the N for that is only 3.

I think we just need to have more examples with more postprocessing examples.

rok-cesnovar avatar Feb 22 '22 17:02 rok-cesnovar

I think we just need to have more examples with more postprocessing examples.

this is key. I spent winter break translating a Stan case study (partial pooling binary trials), written in RStan into CmdStanR and CmdStanPy. it took me quite a while to figure out how to get CmdStanR stuff working properly - WIP is here: https://github.com/stan-dev/example-models/blob/update-hierarchical-pooling/knitr/pool-binary-trials/pool-no-pool.Rmd

mitzimorris avatar Feb 22 '22 17:02 mitzimorris

Very nice, thanks for working on this! Yeah, we need stuff like this.

Will let @jgabry chime in before closing this.

rok-cesnovar avatar Feb 22 '22 17:02 rok-cesnovar

Yeah I think we just need more examples/doc on on postprocessing. The functionality that people want is there already (I don't think we need to duplicate posterior's functionality in cmdstanr or add more wrappers) but we definitely don't yet have enough easy-to-follow examples demonstrating how use everything.

WIP is here: https://github.com/stan-dev/example-models/blob/update-hierarchical-pooling/knitr/pool-binary-trials/pool-no-pool.Rmd

@mitzimorris This is great! We definitely need more like this.

jgabry avatar Feb 22 '22 19:02 jgabry

From an end user's perspective I think this is fine. I would expect to find an exemple here: https://mc-stan.org/cmdstanr/articles/cmdstanr.html

a <- posterior::as_draws_rvars(fit$draws())
posterior::draws_of(a$surv)[1,]

fsdias avatar Feb 23 '22 09:02 fsdias

Thank @fsdias! Will keep this issue open until we add at least a short example like you mention.

rok-cesnovar avatar Feb 23 '22 09:02 rok-cesnovar

As an end user, it would be very happy for fit$draws(format = "draws_rvars") to return the following object:

  draws_list <- posterior::as_draws_rvars(fit$draws())
  var_names <- names(draws_list)
  out <- lapply(var_names, function(var_name) {
    posterior::draws_of(draws_list[[var_name]])
  })
  names(out) <- var_names
  return(out)

I don't think it is necessary to store this result in fit.

MatsuuraKentaro avatar Feb 24 '22 03:02 MatsuuraKentaro

to return the following object:

This wish looks like an issue for the posterior package proposing a new helper function

avehtari avatar Mar 13 '22 22:03 avehtari

Is there a way currently to get a structured output in Stan that's not a vector? For example, if I have a parameter declared as

matrix[3, 2] A;

Is there a way for me to fit N in K chains and draws and get a 3 x 2 x (N*K) matrix out (my understanding is that would be the efficient way to access given R's last column major scheme).

For me, literally all I want out of the Stan and R side is to

  1. compile a model
  2. load data into a model
  3. sample from model + data
  4. structured extract like RStan's
  5. summary

I think the right way to architect these packages from a software modularity perspective is to not have cmdstanr or posterior depend on each other, but to have cmdstanr produce an output format that posterior can ingest. Then all this doc would go on the posterior side. But I realize this isn't how R packages typically work---they typically import the world and expose it all in the top-level namespace.

bob-carpenter avatar Sep 07 '22 22:09 bob-carpenter

Going to close this old issue since it's similar to #183 and there is a solution available via the posterior package.

Suppose x is a 2x3 parameter matrix:

draws <- posterior::as_draws_rvars(fit$draws())
x_draws <- posterior::draws_of(draws$x) # same shape as rstan::extract(fit, pars = "x")$x

I will add something to the doc to make this more clear.

I think the right way to architect these packages from a software modularity perspective is to not have cmdstanr or posterior depend on each other, but to have cmdstanr produce an output format that posterior can ingest.

posterior does not depend on cmdstanr. but cmdstanr does use posterior internally for various things like reformatting and subsetting posterior draws. If we didn't do that we would need to duplicate that code in cmdstanr, which wouldn't be ideal from a maintenance perspective.

jgabry avatar Apr 22 '24 21:04 jgabry

This will now be better documented in the CmdStanR vignettes: https://github.com/stan-dev/cmdstanr/pull/955/files

jgabry avatar Apr 22 '24 21:04 jgabry