ajr/negativebinomialvariants
This introduces 3 new distribution types, corresponding to re-parameterizations of the NegativeBinomial distribution as
- location-scale Negative binomial distribution (alternative paramerization), Stan
- loglocation-scale Negative binomial distribution (log alternative parameterization), Stan
- shape-scale Negative binomial distribution, Bayesian Data Analysis (3rd edition), Appendix A
These have a number of uses. In particular, 1. and 2. emphasize the that the negative binomial distribution is a robust form of the Poisson, which has noted utility for negative binomial regression involving offsets (a model frequently employed in epidemiology, failure modeling). The shape-scale parameterization confers superior numerical stability, which can be useful when working with rare events (particularly when one wishes to draw samples).
In particular, for rare events, the alternative parameterizations
are robust in the presence of values which might otherwise suffer
from floating point roundoff using the NegativeBinomial.
This can be demonstrated by considering the conversion of a
location-scale NegativeBinomial2 to NegativeBinomial
using Plots
gr(size=(600,400))
f(μ, ϕ) = ϕ / (μ + ϕ)
p1 = contour(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);
p2 = heatmap(1e-16:5e-17:1e-15, 1.0:0.01:10.0, (x, y) -> log(f(x, y)), ylabel="ϕ", xlabel="μ", right_margin=40*Plots.px);
I write the above not to malign the choice of the NegativeBinomial
in terms of (r, p), but to emphasize that there are real
advantages to using the alternative parameterizations. Naturally,
conversions are provided between the parameterizations, so that
the experience is seamless.
Notes:
-
] test Distributionsrevealed no issues. - Documentation additions seem to work without issue.
Codecov Report
Merging #1536 (c98862a) into master (bb11df8) will decrease coverage by
0.08%. The diff coverage is77.90%.
@@ Coverage Diff @@
## master #1536 +/- ##
==========================================
- Coverage 85.45% 85.37% -0.09%
==========================================
Files 128 131 +3
Lines 7819 8035 +216
==========================================
+ Hits 6682 6860 +178
- Misses 1137 1175 +38
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/Distributions.jl | 100.00% <ø> (ø) |
|
| src/univariates.jl | 74.07% <ø> (ø) |
|
| ...nivariate/discrete/negativebinomialpoissongamma.jl | 72.00% <72.00%> (ø) |
|
| ...rc/univariate/discrete/negativebinomiallocation.jl | 75.00% <75.00%> (ø) |
|
| ...univariate/discrete/negativebinomialloglocation.jl | 76.00% <76.00%> (ø) |
|
| src/conversion.jl | 100.00% <100.00%> (ø) |
|
| src/univariate/discrete/bernoulli.jl | 89.39% <0.00%> (-1.92%) |
:arrow_down: |
| src/univariate/continuous/uniform.jl | 92.20% <0.00%> (-0.74%) |
:arrow_down: |
| src/univariate/discrete/binomial.jl | 94.28% <0.00%> (-0.28%) |
:arrow_down: |
| src/univariate/discrete/geometric.jl | 89.47% <0.00%> (-0.24%) |
:arrow_down: |
| ... and 8 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update bb11df8...c98862a. Read the comment docs.
Additionally, I think the tests should be extended such that most/all newly added lines are covered.
With respect to tests, a quick inspection of the NegativeBinomial reveals that it mostly relies on R for the tests. The readme.md file in /test/ref/ seems to indicate that extending the R tests to these additional distributions should proceed analogously to NormalCanon?
Thoughts:
- What sorts of additional tests should I add in Julia?
- The R tests are nicely set up, but for this example, will merely verify the math specifying the conversions between parameterizations. Any other suggestions for tests?
Yeah, unfortunately the test setup is a bit of a mess, mostly due to historical reasons.
Since NegativeBinomial is already checked with the values from R, I think a good start would be to check that the alternative parameterizations are consistent with NegativeBinomial in Julia. Currently, in this PR only the logpdf values are compared it seems but actually ideally we would check all implemented functions. Sampling should be checked as well, e.g., by computing summary statistics (not sure how it's done for NegativeBinomial).
Regarding R, I guess one could copy the existing references for NegativeBinomial and replace the Julia type and its parameters with the ones from the alternative parameterizations (ie. update the Julia side without changing the R part).
Just a friendly ping of this PR since I indeed would like to use these alternative parameterizations of the NegativeBinomial. Is there anything major holding a merge back?