infer Representing a Generalized Pareto with supported factors

I am trying to model the tail of a Gumbel, i.e. a Generalized Pareto.

I understand that the philosophy of Infer.NET is to provide basic distributions and let the user combine them, as discussed in the old forum and in some issues here. Using this approach, how shall I represent a Generalized Pareto?

Assuming my Generalized Pareto has a positive shape parameter, I can encode it as an Exponential-Gamma mixture as described in: https://en.wikipedia.org/wiki/Generalized_Pareto_distribution#GPD_as_an_Exponential-Gamma_Mixture

Gamma factors are supported natively by Infer.NET, so I can use this directly as the parameter of an Exponential.

How shall I encode the Exponential? Shall I simply exponentiate some positive real number drawn from a Uniform using the parameter drawn from a Gamma?

Dec 30 '21 10:12 solna86

An exponential distribution is equivalent to a Gamma distribution with shape parameter equal to 1. So you can write this as a Gamma variable whose rate is another Gamma variable. Another way to see it is that the Generalized Pareto with mu=0 and positive shape is a special case of the F distribution.

Dec 30 '21 12:12 tminka

Many thanks, I had overlooked that equivalence.

What are some recommended weakly-informative priors for alpha and beta in the Gamma distribution, taking into account that GeneralizedPareto(xi=1/alpha, sigma=beta/alpha) where alpha and beta are the shape and rate parameters of Gamma?

Dec 30 '21 14:12 solna86

We don't recommend priors here. Try asking on Cross Validated.

Dec 30 '21 20:12 tminka

Thanks. I apologize if my question sounded as an off-topic query about priors.

I am quite familiar with that on more general probabilistic systems, and I know this is not the place to ask.

However, I am having some trouble connecting distributions on Infer.NET.

For example, consider a simple Beta-Uniform mixture model where the mixing rate and one parameter of Beta observations are unknown:

var p = Observed(double_array);
var i = p.Range;

var m = Beta(1, 1);
var a = Beta(1, 1);

using(ForEach(i))
{   
    var c = Bernoulli(m);

    using(If(c))
    {   
        p[i] = Beta(a, 1);
    }

    using(IfNot(c))
    {   
        p[i] = Beta(1, 1);
    }
}

Infer.NET did not support the above model with any algorithm and quality band. The part that causes problems is the Beta prior for a.

The only parametrization that I have been able to compile is replacing Beta-Beta with Gaussian/Gamma-Gaussian. But this is quite unnatural as observations are p-values, thus constrained to [0, 1], and very slow.

So my questions are:

Is there a more natural alternative that is supported by Infer.NET?
Can I learn more about these limitations and how to approach them somewhere?

Jan 06 '22 23:01 solna86

To model values constrained to [0,1] in a flexible way, you can use:
- a logistic transformation of a Gaussian
- Max(0, Min(1, Gaussian))
The limitations are documented at the List of factors and constraints. You can see there that stochastic parameters of a Beta distribution are not supported.
For Beta(a,1), a Gamma prior on a is conjugate so this would be fairly easy to support.

Jan 07 '22 10:01 tminka

PR #386 adds support for Beta(a,1) with Gamma a.

Jan 07 '22 16:01 tminka

Many thanks for taking the time to support this @tminka!

I have pulled the latest master and built Infer.NET. A mixture model like the one I posted previously, with Beta(a, 1) or Beta(1, a), and a = Gamma(...) in one of the discrete mixture branches now compiles on that Infer.NET build, which is great.

Typically, in a Beta-Uniform mixture model of p-values, the free parameter in Beta is alpha [1]. And alpha is usually constrained to [0, 1] in MLE. However, this parametrization crashes at runtime:

Unhandled exception. System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> Microsoft.ML.Probabilistic.Factors.ImproperMessageException: Improper distribution during inference (Beta(NaN,8130)).  Cannot perform inference on this model.

I assume this is a numerical issue (underflow?). Changing the parameters of the Gamma prior didn't help.

Switching to a Beta(1, a) mixing component, instead of Beta(a, 1), works well for medium-sized datasets. I presume this is because here the posterior distribution of a is much larger than 1.

I've encountered the same issue for large datasets of ~1e7 p-values, i.e. again the same runtime error with a NaN in Beta. Is there anything I can do to scale Infer.NET to these large datasets?

[1] https://academic.oup.com/bioinformatics/article/19/10/1236/184434

Jan 11 '22 04:01 solna86

How can I reproduce that problem?

Jan 11 '22 09:01 tminka