bayesplot icon indicating copy to clipboard operation
bayesplot copied to clipboard

ppc_stat_grouped() fails when subset() is used inside mvbf()

Open dholstius opened this issue 1 month ago • 0 comments

Summary

  • Thanks for an amazing package!
  • ppc_stat_grouped() works with a multivariate model fit via brms::brm() :white_check_mark:
  • ppc_stat_grouped() fails with a multivariate model when subset() is used inside the mvbf() :x:
  • Passing pre-filtered newdata is a workaround; other workarounds welcome
  • May apply to ppc_* other than ppc_stat_grouped(); that's all I tested

In the second example below, I get the error message:

Using all posterior draws for ppc type 'stat_grouped' by default.
Error in `validate_group()`:
! length(group) must be equal to the number of observations.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace(drop = FALSE)
<error/rlang_error>
Error in `validate_group()`:
! length(group) must be equal to the number of observations.
---
Backtrace:
     ▆
  1. ├─bayesplot::pp_check(...)
  2. └─brms:::pp_check.brmsfit(...)
  3.   └─brms::do_call(ppc_fun, ppc_args)
  4.     └─brms:::eval2(call, envir = args, enclos = envir)
  5.       └─base::eval(expr, envir, ...)
  6.         └─base::eval(expr, envir, ...)
  7.           ├─bayesplot (local) .fun(y = .x1, yrep = .x2, group = .x3, stat = .x4)
  8.           │ └─base::eval(ungroup_call("ppc_stat", call), parent.frame())
  9.           │   └─base::eval(ungroup_call("ppc_stat", call), parent.frame())
 10.           └─bayesplot::ppc_stat(...)
 11.             └─bayesplot::ppc_stat_data(...)
 12.               └─bayesplot:::validate_group(group, length(y))
 13.                 └─rlang::abort("length(group) must be equal to the number of observations.")

Reprex

library(tidyverse)
library(brms)
library(bayesplot)

rbernoulli <- function (n, prob = 0.5) {
  sample(c(0L, 1L), size = n, replace = TRUE, prob = c(1 - prob, prob))
}

dat <-
  tibble(
    grp = LETTERS[1:9],
    y0 = map(seq(0.1, 0.9, by = 0.1), ~ rbernoulli(300, prob = .)),
    y1 = map(seq(0.1, 0.9, by = 0.1), ~ rbernoulli(300, prob = . / 2))) %>%
  unnest(
    c(y0, y1))

#' Just a little helper function to simplify the cases below
fit_logistic <- function (...) {
  brm(
    ...,
    data = dat,
    cores = 4, backend = "cmdstanr", # comment out as you see fit
    family = bernoulli())
}

#' Works great
fit <- fit_logistic(mvbf(y0 ~ (1|grp), y1 ~ (1|grp)))
pp_check(fit, resp = "y1", group = "grp", type = "stat_grouped", stat = "mean")

#' Variation: use `subset(...)` inside the `mvbf()`
#' Fails with "length(group) must be equal to the number of observations"
fit_sub <- fit_logistic(mvbf(y0 ~ (1|grp), y1 | subset(y0 == 1) ~ (1|grp)))
pp_check(fit_sub, resp = "y1", group = "grp", type = "stat_grouped", stat = "mean")

#' If we pass in `newdata` that matches the `subset()` criteria --- here, `y0 == 1` --- then success again
newdata <- filter(dat, y0 == 1)
pp_check(fit_sub, newdata = newdata, resp = "y1", group = "grp", type = "stat_grouped", stat = "mean")

dholstius avatar May 14 '24 22:05 dholstius