posterior discrete rvars (factors, ordered factors)

trafficstars

This is related to #17 but slightly more specific. At some point I think it would be good to have rvars that behave like factors, as at the very least these would be useful for posterior predictive distributions from ordinal and categorical models.

I think the general idea would be:

have classes like c("rvar_factor", "rvar") and c("rvar_ordered", "rvar_factor", "rvar") with implementations of the base factor functions like levels().
have typical constructor and conversion functions like rvar_factor() and as_rvar_factor()
(possibly) auto-detect factor types in as_rvar() by creating a factor if the input is an integer array with a "levels" attribute (as I think brms returns for posterior predictive distributions on categorical models).

This does raise a question if we similarly need subtypes like "rvar_numeric", "rvar_integer", and "rvar_logical" which could be automatically applied by the rvar() constructor, as_rvar(), and draws_of().

Another question is how to print them. For ordered factors maybe median +/- mad? For categorical factors I dunno (modal category with its proportion?)

I don't see any rush on this issue but I thought I'd throw it out there for thoughts/feedback.

Collecting implementation requirements here:

[ ] rvar_factor and rvar_ordered constructors and conversion functions
[ ] rvar / as_rvar should auto-detect factor-like input (arrays with "levels" attribute
[ ] rvar / as_rvar should auto-detect character input and convert to factor
[ ] figure out printing
[ ] figure out functions that should not work with factors
[ ] casting to/from other draws formats with mixed variables (some factor, some not)

May 21 '21 17:05 mjskay

(possibly) auto-detect factor types in as_rvar() by creating a factor if the input is an integer array with a "levels" attribute (as I think brms returns for posterior predictive distributions on categorical models).

That would be cool. There are also packages (e.g. rstanarm and at least one more that I'm forgetting at the moment) that return character for the ppd in these models, so maybe we could also convert character into factor automatically.

This does raise a question if we similarly need subtypes like "rvar_numeric", "rvar_integer", and "rvar_logical"

I can see the use case for factor more clearly, but I could easily be overlooking something. Would the distinction between numeric an integer have any real affect in practice? I'm thinking it only would if we used this information to prevent certain operations on discrete variables (but left to its own devices R would just treat numeric and integer the same in nearly every case, right?).

For logical could we just use factor (with 2 levels) or integer (that happens to only be 0/1)? Are there things we'd want to do with the object that would require a class specific to logical variables?

I don't see any rush on this issue but I thought I'd throw it out there for thoughts/feedback.

I also don't see any rush but this is a cool idea so I definitely think it's worth talking about.

May 21 '21 18:05 jgabry

That would be cool. There are also packages (e.g. rstanarm and at least one more that I'm forgetting at the moment) that return character for the ppd in these models, so maybe we could also convert character into factor automatically.

Good idea, I added it to a list of requirements in the first comment above.

I can see the use case for factor more clearly, but I could easily be overlooking something. Would the distinction between numeric an integer have any real affect in practice? I'm thinking it only would if we used this information to prevent certain operations on discrete variables (but left to its own devices R would just treat numeric and integer the same in nearly every case, right?).

For logical could we just use factor (with 2 levels) or integer (that happens to only be 0/1)? Are there things we'd want to do with the object that would require a class specific to logical variables?

Yeah, broadly I see two use cases:

Preventing operations / allowing us (and users) to make generic functions based on these types. E.g. Pr() currently manually checks if the draws vector is logical and spits out an error if not. It might be cleaner to have an rvar_logical class and a corresponding Pr.rvar_logical() implementation. These classes would just be automatically applied according to the format of the internal vector whenever new_rvar() or draws_of() is run internally.
Allowing users to make explicit casts between types. E.g. I am thinking that rvar_factor should be a bit more strict than base factors are; I think there are a lot of bugs in R code created by implicit casts from factors to numerics. Forcing people to make that cast would be good, but at the very least requires a function like as_rvar_numeric(). On the one hand, this function could simply return a "basic" rvar that is not a factor. On the other hand, it might be cleaner to have it actually correspond to a real subtype of rvar.

May 21 '21 23:05 mjskay

Adding rvar_factor and rvar_ordered sounds like a super nice idea!

I would tend to agree that rvar_numeric, rvar_integer, etc. could be sensible as well even though not as important as rvar_factor of course. @mjskay if you think adding those wouldn't be too much overhead, then please feel free to add those as well from my side. We can also separate these two issues, if you think that could make it easier for you.

May 22 '21 08:05 paul-buerkner

Sounds good to me too

May 22 '21 16:05 jgabry

posterior posterior copied to clipboard

discrete rvars (factors, ordered factors)

posterior
posterior copied to clipboard