posterior icon indicating copy to clipboard operation
posterior copied to clipboard

Distinguishing between scalar rvars and arrays-of-size-1

Open mjskay opened this issue 3 years ago • 1 comments

Copying @martinmodrak's post from discourse here since it should probably be resolved in some way:

Hi, it appears that the rvar interface in posterior cannot distinguish between a scalar and array of size 1, i.e.

scalar <- draws_matrix("y" = 1:10)
array1 <-  draws_matrix("y[1]" = 1:10)
all.equal(scalar, array1) # Correctly unequal
# [1] "Attributes: < Component “dimnames”: Component “variable”: 1 string mismatch >"
all.equal(as_draws_rvars(scalar), as_draws_rvars(array1)) # Incorrectly equal
# [1] TRUE
all.equal(as_draws_matrix(as_draws_rvars(array1)), array1) # Incorrectly unequal
# [1] "Attributes: < Component “dimnames”: Component “variable”: 1 string mismatch >"

Is this the desired/expected behaviour? Or is this a bug?

Also copying my thoughts from later in the thread:

Hmm. One behavior that base R vectors do that I did not mimic in rvars because it creates many corner cases in code is that vectors do not require a dim attribute (i.e. dim(x) can be NULL), whereas rvars will always return a numeric vector for dim (sometimes it's just the length of the vector). I did this because otherwise a bunch of code has to check for NULL dims and use length instead, which at the time did not seem useful.

That said, one way to distinguish between a "scalar" in base R and an array of size 1 is if dim(x) == NULL or dim(x) == 1. So one option would be to have the conversion functions recognize this convention. It will be slightly annoying to fix since we'd have to check all uses of dim() on rvars in the code that assume dim() is guaranteed non-null, and other functions that assume the internal array always has at least two dimensions (since currently this is also guaranteed). This would also mean the value of draws_of() returned to a user would not be guaranteed to have at least two dimensions. I'm also not sure if this would make some other operations on rvars kind of annoying to use if they are returning values that the user would expect to be treated as scalar when converted to other formats but which are actually 1-element arrays.

The other option would be a more focused solution that just uses an attribute, as you suggest. It might be an attribute for array-ness rather than scalar-ness; then 1-element arrays that get converted to rvars from another format could retain their array-ness through this attribute. Would be curious what folks think. (I'm not sure either solution sounds great to me at the moment, at least from the perspective of implementing it :) )

Another idea that occurred to me if we don't want to change how rvars treat scalars versus arrays-of-1 would be to instead expose the helper functions for flattening/unflattening array indices, since it sounds (maybe) like that's what you were intending to use the rvar stuff for @martinmodrak?

mjskay avatar Aug 01 '21 04:08 mjskay

Another idea that occurred to me if we don't want to change how rvars treat scalars versus arrays-of-1 would be to instead expose the helper functions for flattening/unflattening array indices, since it sounds (maybe) like that's what you were intending to use the rvar stuff for @martinmodrak?

There are two somewhat separate problems: 1) what to use as a general format to store all prior and posterior draws in the SBC package and 2) Converting posterior draws from a Stan model into data for Stan.

Maybe draws_rvars is just not a good choice for 1) (there was no particular reason to choose draws_rvars over say draws_matrix - we just liked rvars :-) ). 2) could then be solved easily if the flattening/unlfattening was exposed.

martinmodrak avatar Aug 01 '21 05:08 martinmodrak