Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

ajr/negativebinomialvariants

Open andrewjradcliffe opened this issue 3 years ago • 4 comments

This introduces 3 new distribution types, corresponding to re-parameterizations of the NegativeBinomial distribution as

  1. location-scale Negative binomial distribution (alternative paramerization), Stan
  2. loglocation-scale Negative binomial distribution (log alternative parameterization), Stan
  3. shape-scale Negative binomial distribution, Bayesian Data Analysis (3rd edition), Appendix A

These have a number of uses. In particular, 1. and 2. emphasize the that the negative binomial distribution is a robust form of the Poisson, which has noted utility for negative binomial regression involving offsets (a model frequently employed in epidemiology, failure modeling). The shape-scale parameterization confers superior numerical stability, which can be useful when working with rare events (particularly when one wishes to draw samples).

In particular, for rare events, the alternative parameterizations are robust in the presence of values which might otherwise suffer from floating point roundoff using the NegativeBinomial. This can be demonstrated by considering the conversion of a location-scale NegativeBinomial2 to NegativeBinomial

using Plots
gr(size=(600,400))
f(μ, ϕ) = ϕ / (μ + ϕ)
p1 = contour(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);
p2 = heatmap(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);

I write the above not to malign the choice of the NegativeBinomial in terms of (r, p), but to emphasize that there are real advantages to using the alternative parameterizations. Naturally, conversions are provided between the parameterizations, so that the experience is seamless.

Notes:

  • ] test Distributions revealed no issues.
  • Documentation additions seem to work without issue.

andrewjradcliffe avatar Apr 25 '22 20:04 andrewjradcliffe

Codecov Report

Merging #1536 (c98862a) into master (bb11df8) will decrease coverage by 0.08%. The diff coverage is 77.90%.

@@            Coverage Diff             @@
##           master    #1536      +/-   ##
==========================================
- Coverage   85.45%   85.37%   -0.09%     
==========================================
  Files         128      131       +3     
  Lines        7819     8035     +216     
==========================================
+ Hits         6682     6860     +178     
- Misses       1137     1175      +38     
Impacted Files Coverage Δ
src/Distributions.jl 100.00% <ø> (ø)
src/univariates.jl 74.07% <ø> (ø)
...nivariate/discrete/negativebinomialpoissongamma.jl 72.00% <72.00%> (ø)
...rc/univariate/discrete/negativebinomiallocation.jl 75.00% <75.00%> (ø)
...univariate/discrete/negativebinomialloglocation.jl 76.00% <76.00%> (ø)
src/conversion.jl 100.00% <100.00%> (ø)
src/univariate/discrete/bernoulli.jl 89.39% <0.00%> (-1.92%) :arrow_down:
src/univariate/continuous/uniform.jl 92.20% <0.00%> (-0.74%) :arrow_down:
src/univariate/discrete/binomial.jl 94.28% <0.00%> (-0.28%) :arrow_down:
src/univariate/discrete/geometric.jl 89.47% <0.00%> (-0.24%) :arrow_down:
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bb11df8...c98862a. Read the comment docs.

codecov-commenter avatar Apr 25 '22 21:04 codecov-commenter

Additionally, I think the tests should be extended such that most/all newly added lines are covered.

With respect to tests, a quick inspection of the NegativeBinomial reveals that it mostly relies on R for the tests. The readme.md file in /test/ref/ seems to indicate that extending the R tests to these additional distributions should proceed analogously to NormalCanon?

Thoughts:

  • What sorts of additional tests should I add in Julia?
  • The R tests are nicely set up, but for this example, will merely verify the math specifying the conversions between parameterizations. Any other suggestions for tests?

andrewjradcliffe avatar Apr 25 '22 23:04 andrewjradcliffe

Yeah, unfortunately the test setup is a bit of a mess, mostly due to historical reasons.

Since NegativeBinomial is already checked with the values from R, I think a good start would be to check that the alternative parameterizations are consistent with NegativeBinomial in Julia. Currently, in this PR only the logpdf values are compared it seems but actually ideally we would check all implemented functions. Sampling should be checked as well, e.g., by computing summary statistics (not sure how it's done for NegativeBinomial).

Regarding R, I guess one could copy the existing references for NegativeBinomial and replace the Julia type and its parameters with the ones from the alternative parameterizations (ie. update the Julia side without changing the R part).

devmotion avatar Apr 26 '22 21:04 devmotion

Just a friendly ping of this PR since I indeed would like to use these alternative parameterizations of the NegativeBinomial. Is there anything major holding a merge back?

DoktorMike avatar Feb 11 '23 12:02 DoktorMike