cmdstanr icon indicating copy to clipboard operation
cmdstanr copied to clipboard

issues converting to draws_rvars format

Open jgabry opened this issue 4 years ago • 10 comments

Describe the bug

While working on #581 I discovered that converting to draws_rvars using fit$draws(format = "draws_rvars") fails if called after previously converting to a different format.

To Reproduce

# rvars works if called first
fit <- cmdstanr_example("schools")
fit$draws(format = "draws_rvars")  # OK

# but not if called after another format
fit$draws(format = "draws_array")  # OK
fit$draws(format = "draws_rvars") # ERROR

The error is:

Error: The following variables are missing in the draws object: 
{'theta[1]', 'theta[2]', 'theta[3]', 'theta[4]', 'theta[5]', 'theta[6]', 'theta[7]', 'theta[8]'}

Expected behavior

Successful conversion between formats.

Operating system mac os x 11.6

CmdStanR version number 0.4.0.9000

jgabry avatar Nov 01 '21 18:11 jgabry

I haven't fully understood the issue yet but I think it has something to do with posterior::variables() behaving differently for draws_rvars than it does for the other formats (i.e., it treats vectors as a single variable):

fit <- cmdstanr_example("schools")
arr <- fit$draws(format = "array")
rv <- posterior::as_draws_rvars(arr)

posterior::variables(arr)
[1] "lp__"     "mu"       "tau"      "theta[1]" "theta[2]" "theta[3]" "theta[4]" "theta[5]"
[9] "theta[6]" "theta[7]" "theta[8]"

posterior::variables(rv)
[1] "lp__"  "mu"    "tau"   "theta"

jgabry avatar Nov 01 '21 18:11 jgabry

@rok-cesnovar Actually this is a bit more involved than I thought. There are actually two separate but related issues.

I think the main problem is what I said above about variables() being different for draws_rvars. But I also uncovered a related problem: the draws_rvars object created by fit$draws(format = "draws_rvars") (when called before using any other format) is wrong because it creates a separate rvar for each element of vector/matrix parameters but the whole point is for those to be a single rvar within the draws_rvars object. This happens because of read_cmdstan_csv(..., format = "draws_rvars"):

https://github.com/stan-dev/cmdstanr/blob/0807dad020067d5abc94b41df7e1c03d2b0f8abd/R/csv.R#L307-L308

Here read_cmdstan_csv() is calling as_draws_rvars() on a data frame with unrepaired variable names (i.e. names like theta.8 instead of theta[8]) and so it fails to recognize it should combine the elements of theta. It creates 8 separate rvars instead of one rvar with 8 elements. I think the names need to be repaired before creating the draws_rvars object.

So now we have multiple issues:

  • We should fix read_cmdstan_csv() so it creates the correct draws_rvars object. But this won't fix the main problem in this issue. In fact it kind of makes it worse, because:
  • If we fix that then fit$draws(format = "draws_rvars") will fail in all cases (not just after a call to a different format) because of the issue where variables() behaves differently for rvars than other formats. The only reason it doesn't currently fail always is that read_cmdstan_csv(format = "draws_rvars") gets called if the draws haven't been read in yet and because it creates 8 separate rvars for theta it actually avoids the issue with variables.

So it's kind of a mess ;)

jgabry avatar Nov 01 '21 20:11 jgabry

Huh, a mess it is. So beside the read_cmdstan_csv I guess we have a few options:

a) convert to rvars on return only b) have a separate internal member for rvars and store it for reuse there

a is slower, b uses more RAM if user mixes formats. I would go b).

rok-cesnovar avatar Nov 01 '21 20:11 rok-cesnovar

The problem with (b) is that we would need to support two different ways of using the variables argument to draws() and read_cmdstan_csv() and other other functions that take a variables argument (maybe summary() and print()?), right? . We would need to allow the user to select individual elements (theta[2]) for most formats and then only allow selecting full variables (theta) for draws_rvars. Unless I'm misunderstanding, it seems like it would make our internal code pretty complicated. Or did I misunderstand option (b)?

What about option (c): don't support draws_rvars in CmdStanR at all. We just tell the user they can easily call posterior::as_draws_rvars() themselves on any of the draws objects returned by CmdStanR. There's really no benefit to internally using the draws_rvar storage format in CmdStanR. The format is mostly useful for when users want to manipulate draws.

What do you think?

jgabry avatar Nov 01 '21 21:11 jgabry

Actually I guess my option (c) is basically like your option (a) but changing who does the conversion (us or the user). So I'd be ok with option (a) or (c). But I'm also happy to be convinced that (b) is preferable.

jgabry avatar Nov 01 '21 21:11 jgabry

Yeah, c) is the same as a), we just provide a convenience to the user, but its the same thing.

Will think about b) more. Was just thinking out loud on my phone, havent looked at what that would mean though.

rok-cesnovar avatar Nov 01 '21 21:11 rok-cesnovar

Will think about b) more. Was just thinking out loud on my phone, havent looked at what that would mean though.

Sounds good!

jgabry avatar Nov 01 '21 21:11 jgabry

Yeah, I tried to tackle the b) suggestion, but it turns out it would be a hassle.

rok-cesnovar avatar Nov 03 '21 20:11 rok-cesnovar

Yeah, I tried to tackle the b) suggestion, but it turns out it would be a hassle.

Ok so then I guess the question for the $draws() method is whether we do

  • convert on return only, or
  • tell the user to convert themselves

I'm ok with either option.

For read_cmdstan_csv(), should we just not allow the draws_rvars format? It's pretty easy for the user to convert it themselves afterwards and most users don't even use read_cmdstan_csv() anyway.

jgabry avatar Nov 03 '21 22:11 jgabry

I ran into this issue again recently. Any further thoughts on this? I think we were still deciding between

  • convert on return only, or
  • tell the user to convert themselves

Right now we still have the implementation that doesn't work properly.

jgabry avatar Feb 08 '22 18:02 jgabry