Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Add ZeroInflatedPoisson distribution

Open emfeltham opened this issue 3 years ago • 6 comments

In the wake of a discussion over on StatisticalRethinkingTuring, I figured that it would be a good idea to put this together. It is used fairly extensively in the social sciences, and is probably not something that researchers should have to construct themselves. Thanks again.

emfeltham avatar Sep 01 '21 16:09 emfeltham

Hi again, I ran and added tests, and changed the code according to the test criteria. It should pass now, at least it does locally. Apologies, and thanks again.

emfeltham avatar Sep 03 '21 23:09 emfeltham

Can’t we have the entire thing as actual MixtureDistribution of Dirac at 0 and Poisson, or make that work?

mschauer avatar Sep 04 '21 06:09 mschauer

Codecov Report

Merging #1393 (2619337) into master (39f9899) will decrease coverage by 0.91%. The diff coverage is 5.12%.

:exclamation: Current head 2619337 differs from pull request most recent head 424b3d4. Consider uploading reports for the commit 424b3d4 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1393      +/-   ##
==========================================
- Coverage   82.54%   81.63%   -0.92%     
==========================================
  Files         116      117       +1     
  Lines        6950     7001      +51     
==========================================
- Hits         5737     5715      -22     
- Misses       1213     1286      +73     
Impacted Files Coverage Δ
src/univariates.jl 72.82% <ø> (ø)
src/univariate/discrete/zeroinflatedpoisson.jl 5.12% <5.12%> (ø)
src/univariate/discrete/discretenonparametric.jl 98.84% <0.00%> (-0.20%) :arrow_down:
src/quantilealgs.jl 82.41% <0.00%> (ø)
src/mixtures/mixturemodel.jl 69.60% <0.00%> (+1.56%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 39f9899...424b3d4. Read the comment docs.

codecov-commenter avatar Sep 05 '21 07:09 codecov-commenter

So let’s make it ZeroInflated{Poisson}? For the same amount of code we get a couple of related zero inflated distributions, e.g ZeroInflated{NegativeBinomial}.

mschauer avatar Sep 05 '21 10:09 mschauer

Building on @mschauer we can generically allow creating all kinds of transformations (Zero-inflated/truncated/modified): (from my discourse post)

using Distributions
λ = 2.0; #Poisson parameter
zp = 0.4; #zero prob. 
dpoi = Poisson(λ)

dzip = MixtureModel([Dirac(0.0), dpoi], [zp, 1.0-zp])  #ZeroInflated ZIP
dztp = Truncated(dpoi, 1, Inf)                         #ZeroTruncated ZTP 
dzmp = MixtureModel([Dirac(0.0), dztp], [zp,1.0-zp])   #ZeroModified ZMP

# A function might look something like
ZeroInflated(d, zp) = MixtureModel([Dirac(0.0), d], [zp, 1.0-zp])
ZeroTruncated(d) = Truncated(d, 1, Inf)
ZeroModified(d, zp) = MixtureModel([Dirac(0.0), ZeroTruncated(d)], [zp,1.0-zp])

Btw, there are also one-inflated Binomial/Poisson/Beta etc which can be handled similarly. This really shows the amazing power of Julia & Distributions.jl!

# x_inflated/x_truncated might look something like
XInflated(d, xp, x) = MixtureModel([Dirac(x), d], [xp, 1.0-xp])
XTruncated(d, x) = Truncated(d, x+1, Inf) # note cts/discrete issue +1...
XModified(d, xp, x) = MixtureModel([Dirac(x), XTruncated(d, x)], [xp,1.0-xp])

# 
ZeroInflated(d, zp) = XInflated(d, zp, 0.0)
ZeroTruncated(d) = XTruncated(d, 0.0)
ZeroModified(d, zp) = XModified(d, zp, 0.0)

We can look at ZeroInflatedDistributions.jl & LRMoE.jl @jkbest2 & @sparktseung do you have any feedback on how to create Zero Inflated random variables?

azev77 avatar Oct 05 '21 19:10 azev77

@azev77

  • Currently in LRMoE.jl v0.2.0, zero-inflated distributions are implemented as a separate object. e.g. there is PoissonExpert(λ) and there is ZIPoissonExpert(p0, λ). I don't this is a good way and I may need to change it up in the future.

  • I think your example using the MixtureModel in Distributions.jl is much better and more maintainable.

  • For discrete distributions with zero inflation & modification, I'd suggest adding some notes in the documentation, to explicitly specify what is the actual zero probability. For zero-inflated Poisson, it should be p0+(1-p0)*exp(-λ). A lot of people tend forget about the second term.

  • For continuous distributions, some may have infinite density at zero, e.g. Gamma with shape<1. One should be careful about writing and interpreting pdf/logpdf in such case.

sparktseung avatar Oct 05 '21 20:10 sparktseung