zi mixture probs (draft)
This is a draft PR to close https://github.com/paul-buerkner/brms/issues/1551. I'm opening the PR as a draft to get feedback on the overall design and implementation before I work on polishing.
This PR adds the argument z_mix = FALSE to pp_mixture with the following effects:
- the default behavior (
z_mix = FALSE) should be unchanged. - For
z_mix = TRUE:- If the model family is a zero-inflation or hurdle family, return the zero-inflation weights in the same format as the weights for mixture families. The first mixture component is the zero-inflation state; the final component is the non-inflated state, and (in the case of
zero_one_inflated_beta) the middle component is the one-inflated state. - If the model is a mixture family that contains a zero-inflated or hurdle model as one of the mixture components, then the output is like the default output, but the array slice for the zero-inflated component is split into two array slices (three in the case of a zero-one-inflated beta, but this should be irrelevant unless https://github.com/paul-buerkner/brms/issues/1552 is resolved in a way that admits the zero-one-inflated-beta in mixture families) following the format described in the previous bullet.
- If the model family is not itself zero-inflated, and doesn't have any zero-inflated components, then this should error informatively.
- If the model family is a zero-inflation or hurdle family, return the zero-inflation weights in the same format as the weights for mixture families. The first mixture component is the zero-inflation state; the final component is the non-inflated state, and (in the case of
TODO (I'll take care of these, but could use feedback from @paul-buerkner where tagged):
- [ ] Verify correctness of output and write tests.
- [ ] Feedback from @paul-buerkner on a good naming convention for the array slices corresponding to different zi/hu mixture components, and implement. Suggestion:
P(K = 2 & z = 1 | Y), wherez = 1means a structural zero,z = 2means no inflation, andz = 3means a structural 1. - [ ] Confirm that this works (or fails gracefully) when confronted with multivariate response models.
- [ ] Feedback from @paul-buerkner on whether to retain minor helper functions and where to put them.
A final design note: if preferred, it is possible to construct an implementation that is easier (for me) to grasp, but that supports only garden-variety zi/hu families and not mixture families that include zi/hu families as components of the mixture. It would be possible to insert this simpler implementation inside of pp_mixture() via the same z_mix argument. A functioning example of how this branch of the control flow inside of pp_mixture might look internally is available at https://github.com/paul-buerkner/brms/issues/1551.
Here's a snippet to play with:
remotes::install_github("jsocolar/brms", ref = "zi-mixture-probs")
library(brms)
set.seed(2)
dat <- data.frame(
y = c(rep(0, 40), rpois(30, 3), rnbinom(30, 4, .4)),
x = rnorm(100)
)
mix <- mixture(poisson, zero_inflated_negbinomial)
fit1 <- brm(y ~ x, dat, family = mix,
backend = "cmdstanr",
cores = 4)
fit2 <- brm(y ~ 1, dat, family = zero_inflated_poisson,
backend = "cmdstanr",
cores = 4)
pp_mixture(fit1)
pp_mixture(fit1, z_mix = TRUE)
pp_mixture(fit2, z_mix = FALSE)
pp_mixture(fit2, z_mix = TRUE)
out1 <- pp_mixture(fit1, summary = FALSE)
out2 <- pp_mixture(fit1, summary = FALSE, z_mix = TRUE)
out3 <- pp_mixture(fit2, summary = FALSE, z_mix = TRUE)
all(round(apply(out1, c(1,2), sum), 10) == 1)
all(round(apply(out2, c(1,2), sum), 10) == 1)
all(round(apply(out3, c(1,2), sum), 10) == 1)
Just a note, there's some kind of numerical instability when calculating the zero inflation mixture components, but it's only cropped up for numerically extreme inputs. For example:
remotes::install_github("jsocolar/brms", ref = "zi-mixture-probs")
library(brms)
set.seed(2)
dat <- data.frame(
y = c(0, 0:10),
x = rnorm(12)
)
mix <- mixture(poisson, zero_inflated_negbinomial)
fit1 <- brm(y ~ x, dat, family = mix,
backend = "cmdstanr",
cores = 4)
fit1
yields a lot of divergences and a numerically absurd fit:
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -1.84 36.68 -100.76 104.57 1.53 303 39
mu2_Intercept 20423274.18 25792409.61 -7167867.25 68382835.00 2.43 5 25
mu1_x 11.29 125.03 -352.45 346.55 1.50 312 38
mu2_x -69804860.36 88156073.74 -233726450.00 24499132.50 2.43 5 25
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape2 2.28 8.93 0.00 17.84 1.48 8 67
zi2 0.42 0.30 0.01 0.97 1.17 18 645
theta1 0.63 0.32 0.01 0.96 1.53 7 34
theta2 0.37 0.32 0.04 0.99 1.53 7 34
Then, pp_mixture(fit1) doesn't yield any problems, but pp_mixture(fit1, z_mix = TRUE, summary = FALSE) yields some NaNs.
Thank you for your effort! The structure of the log_lik_mixture function looks a bit convoluted now. I don't really want this function to be touched, since it is meant purely for mixture models set up via mixture().
I think I would prefer a different internal approach only affecting pp_mixture, but I cannot fully explain is now because I need to think more about it. Could you provide me with the isolated code how to do pp_mixture for both zi and hu models, if you just implemented it for them in isolation? Without considering the current pp_mixture code? On that basis, I may add the feature myself to pp_mixture. I am sorry to cause you extra effort.
Hi Paul, no worries. I agree this implementation is convoluted, and if you have a vision for how to simplify that's great! The implementation over on the feature request already handles the discrete zero-inflated cases, so I'll just add some logic to recapitulate the data for the hurdle cases and then hand over to you to figure out how to properly expose it through pp_mixture. Might not come for another week or so because I'm currently traveling.