math icon indicating copy to clipboard operation
math copied to clipboard

Generalized Pareto distribution

Open JackCaster opened this issue 2 years ago • 12 comments

Overview

Provide an implementation of the generalized Pareto distribution as described here:

This distribution is often used in extreme value theory to model rare events identified as peak over threshold.

Description

Definitions in Stan of the density in the parameterization we want, as well as the cdf, lcdf, lccdf, and rng functions are available in

Implement the following functions with full autodiff and tests.

  • [ ] gen_pareto_lpdf
  • [ ] gen_pareto_lupdf
  • [ ] gen_pareto_cdf
  • [ ] gen_pareto_lcdf
  • [ ] gen_pareto_lccdf
  • [ ] gen_pareto_rng

Additional resources

Current Version:

v4.3.0

JackCaster avatar Feb 01 '22 14:02 JackCaster

Hi, I was wondering if it is worth adding the Generalized Pareto distribution to the math library at all. Could you share if there is any plans of implementing this feature? I am of course very interested to have it implemented!

JackCaster avatar May 04 '22 12:05 JackCaster

Thanks for posting an issue, @JackCaster. It looks like there are four distributions called "generalized Pareto":

The type 2 version is implemented in Stan already:

  • Language reference: https://mc-stan.org/docs/2_29/functions-reference/pareto-type-2-distribution.html

If you'd like another version, please specify.

It also looks like something else went into our C++:

  • C++ code: https://github.com/stan-dev/math/pull/2093

For addition to brms, I suggest starting with a feature request in the brms repo.

bob-carpenter avatar May 04 '22 16:05 bob-carpenter

Interesting. The Wikipedia article describes the Pareto distribution Type I-IV as "Generalized", but I was talking about this formulation, which is the one commonly referred as GPD in the extreme value theory field. I noticed that the article you linked has also a section called Relation to the generalized Pareto distribution a bit below, which increases confusion.

I opened an issue in the brms repo (https://github.com/paul-buerkner/brms/issues/1113). Alternatively, I could supply the distribution from the Stan math library as an external function to brms.

JackCaster avatar May 04 '22 20:05 JackCaster

I'm definitely confused on nomenclature. Any of those densities would be easy to implement in Stan as they all have simple pdfs. I'll edit the issue to reflect the specific density being requested.

bob-carpenter avatar May 04 '22 20:05 bob-carpenter

I am also quite confused. Perhaps, the Matlab description is a bit clearer than the Wikipedia page? Also, here there is a list of R packages that implement GPD.

JackCaster avatar May 05 '22 09:05 JackCaster

@JackCaster: the most useful thing for us would be a definitive pointer to the function that you want to be implemented.

For instance, is it the definition given in the MATLAB link. It appears to match the definition on the Wikipedia page.

The constraints on the three parameters plus variate are non-trivial, so any parameterization of a model using this density in Stan is going to have to be careful with constraints to stay within support everywhere. The constraints are even asymmetric based on xi so will wind up involving conditionals in the expression of the constraint.

For example, if we just have generalized_pareto_lpdf(y | mu, sigma, xi), for scalar parameters and vector y, then we need something like the following constraint on mu (I didn't test this, so there may be a typo in there somewhere).

parameters {
  real xi;
  real<lower=0> sigma;
  real<lower=(xi >= 0 ? positive_inifinity() : max(y) + sigma / xi), upper=min(y)> mu;
  ...

If mu is some kind of regression, then I'm not clear on how to formulate the required constraint on support.

bob-carpenter avatar May 10 '22 14:05 bob-carpenter

@bob-carpenter I totally get your confusion! Aren't the functions by Aki (https://mc-stan.org/users/documentation/case-studies/gpareto_functions.html) a good starting point? I was thinking those functions could be included in the Stan library to make their use a bit easier (and robust). I understand, however, that their implementation may require a bit more thinking to go around some details...

JackCaster avatar May 11 '22 12:05 JackCaster

@JackCaster: Thanks. If @avehtari's definitions are for the parameterization you want, could you please indicate that in the issue itself? Then the question is what to call the distribution. I would prefer gen_pareto to go along with our use of inv and neg. That'll be enough specificity that someone can implement, which shouldn't be too hard.

Presumably we'd want to keep @avehtari's approximation for small abs(k).

[edit] I went and updated the issue for you. So please check and change if it's not what you were after. Thanks.

bob-carpenter avatar May 11 '22 16:05 bob-carpenter

Different R packages use acronym gpd. Python SciPy uses genpareto I'm used to gpd, but if you feel gen_pareto is more clear, then I'm fine with that.

Presumably we'd want to keep @avehtari's approximation for small abs(k)

That is very important. Without that, the results are unstable when k is close to 0. R package extraDistr did not have that, which I noticed when I was testing my Stan implementation (the current extraDistr has small abs(k) thing as I made issue about that)

avehtari avatar May 11 '22 16:05 avehtari

Thanks, @avehtari.

For naming, it's more for consistency---our other distribution names are inv_gamma, neg_binomial, etc. But I also think it's clearer if someone doesn't know the gpd TLA.

bob-carpenter avatar May 11 '22 21:05 bob-carpenter

Hi! I was wondering if this issue is part of any future milestones or such---looking forward to playing around with it!

JackCaster avatar Oct 04 '22 13:10 JackCaster

Sorry, @JackCaster, but we gave up on writing roadmaps or milestones because they were too contentious among the developers. So now we just try to triage features into ones that we'd like to see implemented like this one and then wait for someone to want it enough to code it.

bob-carpenter avatar Oct 04 '22 17:10 bob-carpenter