posterior icon indicating copy to clipboard operation
posterior copied to clipboard

should variables() behave differently for `draws_rvars` format?

Open jgabry opened this issue 4 years ago • 4 comments
trafficstars

When working on https://github.com/stan-dev/cmdstanr/issues/582 I was reminded that variables() and nvariables() behave differently for the draws_rvars format compared to all other formats. I don't think this difference (treating vector/matrix variables as a single variable vs. treating elements as separate variables) is documented anywhere (or maybe I missed it). Should I add this information to the help("draws-index") page where these functions are documented?

jgabry avatar Nov 01 '21 22:11 jgabry

Yeah, thanks that would be helpful! I think it's mentioned in the vignette but probably not elsewhere.

Alternatively, we could discuss if we want to keep that difference? It may be that we want to be able to get variable names and counts counted either way (by the base variable name or by the combination of variable and indices) when using any format, in which case we might want an additional function (or option) to do that. In hindsight while the idiosyncratic version of variables() makes sense for draws_rvars, it could be problematic in some cases, e.g. if someone was using the variables listed by variables() with extract_variable(). Kinda breaks the abstraction a bit...

mjskay avatar Nov 01 '21 23:11 mjskay

I think it's mentioned in the vignette but probably not elsewhere.

I forgot about the vignette. You're right, it does show this behavior there. I can also add something to the roxygen doc ( I guess the doc will depend on what we decide about adding a new function or argument, like you mentioned).

Alternatively, we could discuss if we want to keep that difference? It may be that we want to be able to get variable names and counts counted either way (by the base variable name or by the combination of variable and indices) when using any format, in which case we might want an additional function (or option) to do that

I like the idea of being able to get this info for any format. I just changed the name of this issue from just about the doc to discussing this question. I'd be ok with either a separate function or just an argument to variables() that toggles returning the names of all elements (although I guess this would need a different default depending on the format in order to not break backwards compatibility).

it could be problematic in some cases

Yeah I just ran into something in CmdStanR where we have some code that mistakenly assumes that variables() will return the same thing regardless of the draws format. variables() and nvariables() being different does make it trickier to work with draws objects because it requires special code just for draws_rvars.

jgabry avatar Nov 02 '21 00:11 jgabry

Yeah I just ran into something in CmdStanR where we have some code that mistakenly assumes that variables() will return the same thing regardless of the draws format. variables() and nvariables() being different does make it trickier to work with draws objects because it requires special code just for draws_rvars.

Makes sense. I think a default of giving the variable names with indices is probably sensible for compatibility with other parts of the interface.

If we had an option for giving only the "base" name, what is a sensible name? variables(x, base = TRUE)? stem = TRUE? array = TRUE? dimensions = FALSE? dims = FALSE? indices = FALSE? with_dims = FALSE?

One potential complication is that variables<-() for draws_rvars in the default case will be a bit hairy. It would maybe have to only allow renaming scalars, or possibly try to detect if someone is doing a valid rename of an array (e.g. if they rename all of c("x[1]", "x[2]", "x[3]") to c("y[1]", "y[2]", "y[3]") simultaneously). So there may still be some inconsistencies in the interface, but we might be able to keep them to a minimum.

I'd be curious what @paul-buerkner thinks.

mjskay avatar Nov 02 '21 03:11 mjskay

I agree that, ideally, variables() should work the same way for all formats so changing the default behavior for rvars and adding a "base" argument sounds sensible to me. I don't have a good intuition about the name of this additional argument but "base" feels good in my ears.

I don't have a good intiution about variables<-() but we could also add the base argument there to do variables(x, base = TRUE) <- (). We could make base=TRUE the default here (slight inconsistency that I think is ok) and for base = FALSE do the input checks as you suggested.

paul-buerkner avatar Jan 05 '22 14:01 paul-buerkner