performance icon indicating copy to clipboard operation
performance copied to clipboard

double-checking `check_zeroinflation`

Open bbolker opened this issue 1 year ago • 4 comments

insight::model_info appears to have a fairly permissive definition of is_count; in particular, it includes the generalized Poisson, genpois (although not compois); is_negbin also includes the generalized Poisson,

within check_zeroinflation, though, it seems that we assume that a model that is_count && !is_negbin is Poisson, and a model that is is_negbin is negative binomial (i.e., we compute dpois() or dnbinom(). This seems problematic:

  • an nbinom1 or genpois model won't have the same distribution as dnbinom()
  • if someone tests a model that is already zero-inflated (this is a silly thing to do but someone could try it?) it won't give the right distribution ...

bbolker avatar Oct 01 '23 21:10 bbolker

Two questions:

  • is check_zeroinflation() useful / meaningful for nbinom1 or genpois? If yes, we should find a package that computes such distributions (maybe extraDistr?)
  • should the function error for models that include a ZI component?

strengejacke avatar Oct 02 '23 11:10 strengejacke

bump

strengejacke avatar Oct 26 '23 08:10 strengejacke

@bwiernik any suggestions which model families to include/exclude when we speak of "count" models? When should insight::model_info()$is_count return TRUE or FALSE?

strengejacke avatar Oct 26 '23 08:10 strengejacke

I'm not too familiar with what check_zeroinflation() does under the hood to comment on the broader issues raised here ATM.

For count models, Poisson, negative binomial, geometric, and binomial (not bernoulli) and their variations would all be "count" models. Multinomial as well, though that has a somewhat different structure. Really most discrete families are count models.

bwiernik avatar Oct 26 '23 08:10 bwiernik

fixed in #643

strengejacke avatar Mar 16 '24 11:03 strengejacke