performance
performance copied to clipboard
double-checking `check_zeroinflation`
insight::model_info
appears to have a fairly permissive definition of is_count; in particular, it includes the generalized Poisson, genpois
(although not compois
); is_negbin
also includes the generalized Poisson,
within check_zeroinflation
, though, it seems that we assume that a model that is_count && !is_negbin
is Poisson, and a model that is is_negbin
is negative binomial (i.e., we compute dpois()
or dnbinom()
. This seems problematic:
- an
nbinom1
orgenpois
model won't have the same distribution asdnbinom()
- if someone tests a model that is already zero-inflated (this is a silly thing to do but someone could try it?) it won't give the right distribution ...
Two questions:
- is
check_zeroinflation()
useful / meaningful fornbinom1
orgenpois
? If yes, we should find a package that computes such distributions (maybe extraDistr?) - should the function error for models that include a ZI component?
bump
@bwiernik any suggestions which model families to include/exclude when we speak of "count" models? When should insight::model_info()$is_count
return TRUE
or FALSE
?
I'm not too familiar with what check_zeroinflation() does under the hood to comment on the broader issues raised here ATM.
For count models, Poisson, negative binomial, geometric, and binomial (not bernoulli) and their variations would all be "count" models. Multinomial as well, though that has a somewhat different structure. Really most discrete families are count models.
fixed in #643