insight icon indicating copy to clipboard operation
insight copied to clipboard

`get_predicted.brmsfit`: Average rating

Open vincentarelbundock opened this issue 4 years ago • 7 comments
trafficstars

Edit: The scope of this issue has changed. See posts below for a description of the problem.

get_predicted.brmsfit does not appear to support multinomial or cumulative models:

library(brms)
library(insight)
mod <- brm(rating ~ treat + period + (1 | subject), family = cumulative(), data = inhaler,
           silent = 2, backend = "cmdstanr")

head(predict(mod))
#      P(Y = 1) P(Y = 2) P(Y = 3) P(Y = 4)
# [1,]  0.83875  0.15200  0.00675  0.00250
# [2,]  0.82900  0.16100  0.00775  0.00225
# [3,]  0.84700  0.14475  0.00700  0.00125
# [4,]  0.83275  0.15525  0.00800  0.00400
# [5,]  0.82825  0.16025  0.00875  0.00275
# [6,]  0.83000  0.16250  0.00625  0.00125

get_predicted(mod)
# Error in t.default(draws): argument is not a matrix

Leaving this here to document a reproducible example. I can probably work on this at some point in the coming days/weeks.

vincentarelbundock avatar Nov 19 '21 13:11 vincentarelbundock

cf.

tidybayes::add_fitted_draws(inhaler, mod)

bwiernik avatar Nov 23 '21 02:11 bwiernik

The predict() output doesn't seem too reasonable to me--there aren't any uncertainty intervals.

For "expectation", two options seem reasonable:

  1. estimated probability and uncertainty for each category
new_inhaler <- data.frame(product = c(-.5, .5), 
                          order = 0, 
                          carryover = 0, 
                          subject = NA)

est_probs <- tidybayes::add_fitted_draws(
  newdata = new_inhaler, model = mod
) |> 
  group_by(product, .category) |> 
  summarize(
    Median = median(.value), CI = .95,
    hdi(.value), .groups = "drop"
  )
format_table(est_probs)
  product .category   Median       95% CI
1   -0.50         1     0.54 [0.11, 0.93]
2   -0.50         2     0.42 [0.08, 0.66]
3   -0.50         3     0.03 [0.00, 0.14]
4   -0.50         4 6.83e-03 [0.00, 0.08]
5    0.50         1     0.76 [0.35, 1.00]
6    0.50         2     0.23 [0.00, 0.56]
7    0.50         3 6.20e-03 [0.00, 0.06]
8    0.50         4 1.08e-03 [0.00, 0.02]
  1. Summarizing the above by multiplying each predicted probability/interval by its category value to get estimated response scale mean.
est_probs |> 
  mutate(across(c(Median, CI_low, CI_high), \(x) as.numeric(.category) * x)) |> 
  group_by(product) |> 
  summarize(
    CI = first(CI),
    across(c(Median, CI_low, CI_high), sum),
    .groups = "drop"
  ) |> 
  format_table()
  product Median       95% CI
1   -0.50   1.49 [0.31, 3.05]
2    0.50   1.25 [0.34, 2.44]

I think both could reasonably be called "expectation", so I suggest we make the first option the default for "expectation" and add an argument indicating whether to summarize into an estimated average rating. Not sure the best term for that yet.

bwiernik avatar Nov 23 '21 04:11 bwiernik

The above is specifically considering cumulative models. For multinomial models, the summarize option wouldn't make much sense.

bwiernik avatar Nov 23 '21 04:11 bwiernik

Visualizing category uncertainty: https://twitter.com/tjmahr/status/1455998320028135427/photo/1 image

@DominiqueMakowski

bwiernik avatar Nov 23 '21 04:11 bwiernik

My initial plan was to do the same thing we do for MASS::polr:

library(insight)
library(MASS)
polr(factor(gear) ~ mpg, data = mtcars) |> 
    get_predicted() |> 
    as.data.frame() |> 
    head()
##   Row Response Predicted
## 1   1        3 0.4405613
## 2   2        3 0.4405613
## 3   3        3 0.3595595
## 4   4        3 0.4221186
## 5   5        3 0.5482261
## 6   6        3 0.5759827

The uncertainty intervals will be added by get_predicted_ci here. The default ci_method is currently “quantile”, but I could change it to “hdi” if you prefer.

The only minor technical challenge is to reshape and align the draws to match the “long” format shown above.

Computing the estimated response scale mean seems useful in some contexts, but it feels like this is something that is (a) easy to compute using data in the long structure above, and (b) perhaps better suited for a library like modelbased.

Of course, we could also include a new argument here, but in general I think that limiting the number of function arguments is an important user interface design goal.

I don’t hold any of these views very strongly, and am happy to rally behind alternatives.

vincentarelbundock avatar Nov 23 '21 14:11 vincentarelbundock

It looks like both what you showed and I showed are "long format", with each row being a response option. That's good to me.

For summarizing into the estimated on the original response scale, perhaps the best approach would be a new 'predict' type or maybe use "classification" for this option? Recall "classification" is currently used for Bernoulli and multinomial models to give the most probable category. The expected mean scale value feels conceptually similar for ordinal models

bwiernik avatar Nov 23 '21 16:11 bwiernik

The original issue has been fixed by commit https://github.com/easystats/insight/commit/fac472d0ce7cba826ca926e77dec908e38bde5ae

I changed the title of the issue to reflect a separate feature request made in a series a post starting with this one, and which I am unlikely to implement:

get_predicted(model, predict="categorical") would return the estimated average outcome.

Three things to keep in mind:

  • Must include a sanity check to ensure that the response is numeric, so that it makes sense to multiply it by the probability.
  • How does that interact with predict="classification" in get_predicted_ci?
  • In our other applications, classification returns "round" values for each level. Is it consistent to return the average rating here (a continuous value)?

vincentarelbundock avatar Nov 25 '21 18:11 vincentarelbundock