insight
insight copied to clipboard
`get_predicted.brmsfit`: Average rating
Edit: The scope of this issue has changed. See posts below for a description of the problem.
get_predicted.brmsfit does not appear to support multinomial or cumulative models:
library(brms)
library(insight)
mod <- brm(rating ~ treat + period + (1 | subject), family = cumulative(), data = inhaler,
silent = 2, backend = "cmdstanr")
head(predict(mod))
# P(Y = 1) P(Y = 2) P(Y = 3) P(Y = 4)
# [1,] 0.83875 0.15200 0.00675 0.00250
# [2,] 0.82900 0.16100 0.00775 0.00225
# [3,] 0.84700 0.14475 0.00700 0.00125
# [4,] 0.83275 0.15525 0.00800 0.00400
# [5,] 0.82825 0.16025 0.00875 0.00275
# [6,] 0.83000 0.16250 0.00625 0.00125
get_predicted(mod)
# Error in t.default(draws): argument is not a matrix
Leaving this here to document a reproducible example. I can probably work on this at some point in the coming days/weeks.
cf.
tidybayes::add_fitted_draws(inhaler, mod)
The predict() output doesn't seem too reasonable to me--there aren't any uncertainty intervals.
For "expectation", two options seem reasonable:
- estimated probability and uncertainty for each category
new_inhaler <- data.frame(product = c(-.5, .5),
order = 0,
carryover = 0,
subject = NA)
est_probs <- tidybayes::add_fitted_draws(
newdata = new_inhaler, model = mod
) |>
group_by(product, .category) |>
summarize(
Median = median(.value), CI = .95,
hdi(.value), .groups = "drop"
)
format_table(est_probs)
product .category Median 95% CI
1 -0.50 1 0.54 [0.11, 0.93]
2 -0.50 2 0.42 [0.08, 0.66]
3 -0.50 3 0.03 [0.00, 0.14]
4 -0.50 4 6.83e-03 [0.00, 0.08]
5 0.50 1 0.76 [0.35, 1.00]
6 0.50 2 0.23 [0.00, 0.56]
7 0.50 3 6.20e-03 [0.00, 0.06]
8 0.50 4 1.08e-03 [0.00, 0.02]
- Summarizing the above by multiplying each predicted probability/interval by its category value to get estimated response scale mean.
est_probs |>
mutate(across(c(Median, CI_low, CI_high), \(x) as.numeric(.category) * x)) |>
group_by(product) |>
summarize(
CI = first(CI),
across(c(Median, CI_low, CI_high), sum),
.groups = "drop"
) |>
format_table()
product Median 95% CI
1 -0.50 1.49 [0.31, 3.05]
2 0.50 1.25 [0.34, 2.44]
I think both could reasonably be called "expectation", so I suggest we make the first option the default for "expectation" and add an argument indicating whether to summarize into an estimated average rating. Not sure the best term for that yet.
The above is specifically considering cumulative models. For multinomial models, the summarize option wouldn't make much sense.
Visualizing category uncertainty:
https://twitter.com/tjmahr/status/1455998320028135427/photo/1

@DominiqueMakowski
My initial plan was to do the same thing we do for MASS::polr:
library(insight)
library(MASS)
polr(factor(gear) ~ mpg, data = mtcars) |>
get_predicted() |>
as.data.frame() |>
head()
## Row Response Predicted
## 1 1 3 0.4405613
## 2 2 3 0.4405613
## 3 3 3 0.3595595
## 4 4 3 0.4221186
## 5 5 3 0.5482261
## 6 6 3 0.5759827
The uncertainty intervals will be added by get_predicted_ci here. The default ci_method is currently “quantile”, but I could change it to “hdi” if you prefer.
The only minor technical challenge is to reshape and align the draws to match the “long” format shown above.
Computing the estimated response scale mean seems useful in some contexts, but it feels like this is something that is (a) easy to compute using data in the long structure above, and (b) perhaps better suited for a library like modelbased.
Of course, we could also include a new argument here, but in general I think that limiting the number of function arguments is an important user interface design goal.
I don’t hold any of these views very strongly, and am happy to rally behind alternatives.
It looks like both what you showed and I showed are "long format", with each row being a response option. That's good to me.
For summarizing into the estimated on the original response scale, perhaps the best approach would be a new 'predict' type or maybe use "classification" for this option? Recall "classification" is currently used for Bernoulli and multinomial models to give the most probable category. The expected mean scale value feels conceptually similar for ordinal models
The original issue has been fixed by commit https://github.com/easystats/insight/commit/fac472d0ce7cba826ca926e77dec908e38bde5ae
I changed the title of the issue to reflect a separate feature request made in a series a post starting with this one, and which I am unlikely to implement:
get_predicted(model, predict="categorical") would return the estimated average outcome.
Three things to keep in mind:
- Must include a sanity check to ensure that the response is numeric, so that it makes sense to multiply it by the probability.
- How does that interact with
predict="classification"inget_predicted_ci? - In our other applications,
classificationreturns "round" values for each level. Is it consistent to return the average rating here (a continuous value)?