Distributions.jl
Distributions.jl copied to clipboard
Missing distributions list
Since we're making pretty great progress at including probability distributions, I thought I'd add a list of distributions we're missing that we might want to add to rival SciPy's list:
Continuous Distribution
- [ ] alpha - An alpha continuous random variable
- [ ] anglit - An anglit continuous random variable
- [x] arcsine - An arcsine continuous random variable
- [x] betaprime - A beta prima continuous random variable
- [ ] bradford - A Bradford continuous random variable
- [ ] burr - A Burr continuous random variable.
- [x] cosine - A cosine continuous random variable
- [ ] dgamma - A double gamma continuous random variable
- [ ] dweibull - A double Weibull continuous random variable
- [x] erlang - An Erlang continuous random variable
- [x] expon - An exponential continuous random variable
- [ ] exponweib - An exponentiated Weibull continuous random variable
- [x] exponpow - An exponential power continuous random variable
- [ ] fatiguelife - A fatigue-life (Birnbaum-Sanders) continuous random variable
- [ ] fisk - A Fisk continuous random variable
- [ ] foldcauchy - A folded Cauchy continuous random variable
- [ ] foldnorm - A folded normal continuous random variable
- [ ] frechet_r - A Frechet right (or Weibull minimum) continuous random variable
- [ ] frechet_l - A Frechet left (or Weibull maximum) continuous random variable
- [ ] genlogistic - A generalized logistic continuous random variable
- [x] genpareto - A generalized Pareto continuous random variable
- [ ] genexpon - A generalized exponential continuous random variable
- [x] genextreme - A generalized extreme value continuous random variable
- [ ] gausshyper - A Gauss hypergeometric continuous random variable
- [ ] gengamma - A generalized gamma continuous random variable
- [ ] genhalflogistic - A generalized half-logistic continuous random variable
- [ ] gilbrat - A Gilbrat continuous random variable
- [ ] gompertz - A Gompertz (or truncated Gumbel) continuous random variable
- [x] gumbel_r - A right-skewed Gumbel continuous random variable
- [ ] gumbel_l - A left-skewed Gumbel continuous random variable
- [ ] halfcauchy - A Half-Cauchy continuous random variable
- [ ] halflogistic - A half-logistic continuous random variable
- [ ] halfnorm - A half-normal continuous random variable
- [ ] hypsecant - A hyperbolic secant continuous random variable
- [ ] invweibull - An inverted Weibull continuous random variable
- [ ] johnsonsb - A Johnson SB continuous random variable
- [x] johnsonsu - A Johnson SU continuous random variable
- [ ] loggamma - A log gamma continuous random variable
- [ ] loglaplace - A log-Laplace continuous random variable
- [ ] lomax - A Lomax (Pareto of the second kind) continuous random variable
- [ ] maxwell - A Maxwell continuous random variable
- [ ] mielke - A Mielke’s Beta-Kappa continuous random variable
- [ ] nakagami - A Nakagami continuous random variable
- [x] pareto - A Pareto continuous random variable
- [ ] pearson3 - A pearson type III continuous random variable
- [ ] powerlaw - A power-function continuous random variable
- [ ] powerlognorm - A power log-normal continuous random variable
- [ ] powernorm - A power normal continuous random variable
- [ ] rdist - An R-distributed continuous random variable
- [ ] reciprocal - A reciprocal continuous random variable
- [x] rayleigh - A Rayleigh continuous random variable
- [ ] rice - A Rice continuous random variable
- [ ] semicircular - A semicircular continuous random variable
- [x] triang - A triangular continuous random variable
- [ ] truncexpon - A truncated exponential continuous random variable
- [ ] tukeylambda - A Tukey-Lamdba continuous random variable
- [x] uniform - A uniform continuous random variable
- [x] vonmises - A Von Mises continuous random variable
- [x] wald - A Wald continuous random variable
- [ ] weibull_min - A Frechet right (or Weibull minimum) continuous random variable
- [ ] weibull_max - A Frechet left (or Weibull maximum) continuous random variable
- [ ] wrapcauchy - A wrapped Cauchy continuous random variable
Discrete Distributions
- [ ] boltzmann - A Boltzmann (Truncated Discrete Exponential) random variable
- [ ] dlaplace - A Laplacian discrete random variable
- [ ] logser - A Logarithmic (Log-Series, Series) discrete random variable
- [ ] planck - A Planck discrete exponential random variable
- [x] skellam - A Skellam discrete random variable
- [ ] zipf - A Zipf discrete random variable
Some of these already exist, but the code needs some review.
Also, we might want to make generic Inverted
, Folded
, Half
types that we implement specialized methods for when dealing with distributions like the Inverse Gaussian.
I have a bunch of code for the alpha-stable distributions, but it needs some work.
Wald = Inverse Gaussian
Woops. I let a few slip through. Sorry about that.
Actualy, we have to be careful with nomenclature here:
- log gamma and log normal refer to opposite transformations (the first is a log of a gamma variate, the second is an exp of a normal variate)
- inverse Gaussian is not the inverse of a Gaussian: perhaps this is an argument for calling it a "Wald distribution" (although I think is less prevalent).
- inverse distributions are often parametrised differently, e.g. inverse Wishart, and the scale parameter of the inverse Gamma is usually inverted (although not in ours).
Sigh. The naming traditions in probability theory are so nutty. Well, let's avoid any hasty generalizations and just try to implement the rest of these distributions.
Add to list: I guess also the noncentral-hypergeometric distributions are missing (http://en.wikipedia.org/wiki/Noncentral_hypergeometric_distribution).
Is there a reason why piece-wise uniform is missing?
Is there a reason why piece-wise uniform is missing?
@mykelk Well, no one has asked for it ...
You could create one using a mixture of uniforms: would that be sufficient for your purpose?
@mykelk you may make a PR if you want piece-wise uniform distribution to be in this package.
We may do it at some point in future. But not at our top priority, as no one else has asked for it.
I have code for Zipf discrete random variable. (Which is more or less read off Wikipedia) I'll make a PR in a few weeks when I am less busy
Am I right when I say that Truncated(Normal(0, sigma), 0, Inf)
equals the half-normal distribution with parameter sigma? If that's the case, halfnorm can be checked, unless a specialized implementation is desired.
Please check the generalized extreme value and generalized Pareto, they seem to be in master already.
And Chernoff. I added code in a PR for this distribution that compiles for 0.6.
I've updated the list
I would say that Gompertz is one that really should be implemented. It's a right-skewed distribution (where most are left-skewed) that is used a lot in vital statistics.
shifted-lognormal?
The recent Field Guide to Continuous Probability Distributions finds that over 100 common univariate-continuous-unimodal distributions are all special cases of a single Grand Unified Distribution. Github repo: https://github.com/gecrooks/fieldguide The nice thing about using larger more flexible families of distribution is it can save a lot of code.
I have Beta-PERT implemented here https://github.com/oxinabox/ProjectManagement.jl/blob/da3de128ebc031b695bcb1795b53bcfeba617d87/src/timing_distributions.jl i could move it into Distribtions.jl
@oxinabox that'd be awesome!
Btw, in the note, PERT (3 param) is a special case of Beta (4 param), which is a special case of GeneralizedBeta (5 param).
not sure if @ some point it'll be cheaper to code the most general parametric families when possible & then list known special cases
I updated a PR (https://github.com/JuliaStats/Distributions.jl/pull/1104) for the SkewNormal distribution w/ tests comparing it w/ Azzalini's sn
.
@andreasnoack previously mentioned that less code is better than more. I agree, that way it's easier to maintain & there is much less code that needs to be checked for bugs etc.
In this spirit in the future it'd be great to add the 5-param Skewed Generalized T distribution which nests: skewed generalized error distribution/generalized error distribution, generalized t distribution/skewed t/student t distribution, skewed Laplace distribution/Laplace distribution, skewed normal distribution/normal distribution, skewed Cauchy distribution/Cauchy distribution uniform distribution,
Even if that was implemented, I'm not sure the punchline would be that we should dispense with separate implementations of Student's t, normal, Laplace, Cauchy, uniform. For one, implementing them as special cases of the skew generalized t isn't necessarily an efficient implementation for any one of them (though that's an empirical question). Furthermore, if you did that, I think most people would find the source code pretty opaque.
To be clear, when I say add Skewed Generalized T distribution which nests x, I don't necessarily mean get rid of separate scripts for the seminal t/normal/Laplace/Cauchy/uniform
. There prob are more efficient ways to implement properties of these.
I'm not sure we necessarily need separate scripts for less seminal distributions:
skewed generalized error distribution/generalized error distribution
generalized t distribution/skewed t
, skewed Laplace distribution
,skewed normal distribution
,
skewed Cauchy distribution
I'd imagine that more eyes looking @ fewer lines of code means things are easier to maintain. Maybe not, that's why we can discuss here.
@johnmyleswhite some of the distributions on your list have been merged, it may be worth updating: BetaPrime, Cosine, exponential power, Pareto, Rayleigh, Triangular, VonMises,
The following from your list have PRs: Burr (PR open), generalized gamma (PR open), maxwell (New PR, Old PR), powerlaw (PR closed bc dep on Optim.jl), (PowerLaw.jl), (PowerLaws.jl)
The following from your list exist online: alpha (AlphaStable.jl), pearson3 (PearsonDistribution.jl), Tukey-Lamdba (GeneralizedLambdaDistribution.jl dep on Roots.jl & NLopt.jl),
G and K (not on your list but useful, depends on Roots.jl & Optim.jl)
log-hyperbolic?
The Bates distribution: https://en.wikipedia.org/wiki/Bates_distribution
Found myself looking for a multivariate logistic distribution but couldn't find one. I think the best way to implement this would be using some kind of method for symmetric location-scale distributions in general, where you can pass a location vector and a scale or inverse-scale matrix to get a new elliptical distribution.
Noncentral Wishart https://github.com/JuliaStats/Distributions.jl/issues/1330
Would like to give adding alpha distribution a shot.