math
math copied to clipboard
Generalized Pareto distribution
Overview
Provide an implementation of the generalized Pareto distribution as described here:
- Wikipedia: generalized Pareto distribution
This distribution is often used in extreme value theory to model rare events identified as peak over threshold.
Description
Definitions in Stan of the density in the parameterization we want, as well as the cdf, lcdf, lccdf, and rng functions are available in
- @avehtari's generalized Pareto case study
Implement the following functions with full autodiff and tests.
- [ ]
gen_pareto_lpdf
- [ ]
gen_pareto_lupdf
- [ ]
gen_pareto_cdf
- [ ]
gen_pareto_lcdf
- [ ]
gen_pareto_lccdf
- [ ]
gen_pareto_rng
Additional resources
-
Stan forums: generalized Pareto distribution.
-
Aki Vehtari case study generalized Pareto functions, with implementations as Stan functions
-
Stan forums technical discussion of types of reals in doc and user-defined functions
-
brms issue tracker: request generalized Pareto
Current Version:
v4.3.0
Hi, I was wondering if it is worth adding the Generalized Pareto distribution to the math library at all. Could you share if there is any plans of implementing this feature? I am of course very interested to have it implemented!
Thanks for posting an issue, @JackCaster. It looks like there are four distributions called "generalized Pareto":
The type 2 version is implemented in Stan already:
- Language reference: https://mc-stan.org/docs/2_29/functions-reference/pareto-type-2-distribution.html
If you'd like another version, please specify.
It also looks like something else went into our C++:
- C++ code: https://github.com/stan-dev/math/pull/2093
For addition to brms, I suggest starting with a feature request in the brms repo.
Interesting. The Wikipedia article describes the Pareto distribution Type I-IV as "Generalized", but I was talking about this formulation, which is the one commonly referred as GPD in the extreme value theory field. I noticed that the article you linked has also a section called Relation to the generalized Pareto distribution a bit below, which increases confusion.
I opened an issue in the brms repo (https://github.com/paul-buerkner/brms/issues/1113). Alternatively, I could supply the distribution from the Stan math library as an external function to brms.
I'm definitely confused on nomenclature. Any of those densities would be easy to implement in Stan as they all have simple pdfs. I'll edit the issue to reflect the specific density being requested.
I am also quite confused. Perhaps, the Matlab description is a bit clearer than the Wikipedia page? Also, here there is a list of R packages that implement GPD.
@JackCaster: the most useful thing for us would be a definitive pointer to the function that you want to be implemented.
For instance, is it the definition given in the MATLAB link. It appears to match the definition on the Wikipedia page.
The constraints on the three parameters plus variate are non-trivial, so any parameterization of a model using this density in Stan is going to have to be careful with constraints to stay within support everywhere. The constraints are even asymmetric based on xi so will wind up involving conditionals in the expression of the constraint.
For example, if we just have generalized_pareto_lpdf(y | mu, sigma, xi)
, for scalar parameters and vector y
, then we need something like the following constraint on mu
(I didn't test this, so there may be a typo in there somewhere).
parameters {
real xi;
real<lower=0> sigma;
real<lower=(xi >= 0 ? positive_inifinity() : max(y) + sigma / xi), upper=min(y)> mu;
...
If mu
is some kind of regression, then I'm not clear on how to formulate the required constraint on support.
@bob-carpenter I totally get your confusion! Aren't the functions by Aki (https://mc-stan.org/users/documentation/case-studies/gpareto_functions.html) a good starting point? I was thinking those functions could be included in the Stan library to make their use a bit easier (and robust). I understand, however, that their implementation may require a bit more thinking to go around some details...
@JackCaster: Thanks. If @avehtari's definitions are for the parameterization you want, could you please indicate that in the issue itself? Then the question is what to call the distribution. I would prefer gen_pareto
to go along with our use of inv
and neg
. That'll be enough specificity that someone can implement, which shouldn't be too hard.
Presumably we'd want to keep @avehtari's approximation for small abs(k)
.
[edit] I went and updated the issue for you. So please check and change if it's not what you were after. Thanks.
Different R packages use acronym gpd
.
Python SciPy uses genpareto
I'm used to gpd
, but if you feel gen_pareto
is more clear, then I'm fine with that.
Presumably we'd want to keep @avehtari's approximation for small abs(k)
That is very important. Without that, the results are unstable when k is close to 0. R package extraDistr did not have that, which I noticed when I was testing my Stan implementation (the current extraDistr has small abs(k) thing as I made issue about that)
Thanks, @avehtari.
For naming, it's more for consistency---our other distribution names are inv_gamma
, neg_binomial
, etc. But I also think it's clearer if someone doesn't know the gpd TLA.
Hi! I was wondering if this issue is part of any future milestones or such---looking forward to playing around with it!
Sorry, @JackCaster, but we gave up on writing roadmaps or milestones because they were too contentious among the developers. So now we just try to triage features into ones that we'd like to see implemented like this one and then wait for someone to want it enough to code it.