math icon indicating copy to clipboard operation
math copied to clipboard

Implement the generalized normal distribution

Open jachymb opened this issue 1 year ago • 4 comments

Description

Implement Generalized normal distribution https://en.wikipedia.org/wiki/Generalized_normal_distribution

Why this is useful?

It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.

These correspond to the $L_p$ norms when used for regularization and in other cases.

For examples:

y ~ normal(X*beta, sigma);

produces the criterion minimize $L_2$ norm of (X*beta-y) (a.k.a. least squares). But then

y ~ double_exponential(X*beta, sigma);

produces the criterion minimize $L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)

Similarly, in the Bayesian interpretation of ridge and LASSO,

beta ~ normal(0, lambda);
y ~ normal(X*beta, sigma);

produces the $L_2$ regularized ridge and

beta ~ double_exponential(0, lambda);
y ~ normal(X*beta, sigma);

produces the $L_1$ regularized LASSO.

Using the Generalized normal distribution would allow to conveniently use an arbitrary $L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.

jachymb avatar Dec 07 '24 21:12 jachymb

Just some notes on the corresponding formulas:

We have the density:

$f(y|\mu,\alpha, \beta) = \frac{\beta}{2\alpha \Gamma(\beta^{-1})} \exp\left(-\left(\frac{|y-\mu|}{\alpha}\right)^{\beta}\right)$

The log-density:

$\log f(y|\mu,\alpha, \beta) = \log(\beta) - \log\Gamma(\beta^{-1}) - \log(2) - \log(\alpha) -\left(\frac{|y-\mu|}{\alpha}\right)^{\beta}$

And the parital derivatives:

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y} = \beta \alpha^{-\beta}|\mu-y|(\mu-y)^{-1}$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \mu} = \beta \alpha^{-\beta}|\mu-y|^{\beta}(y-\mu)^{-1} = -\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y}$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \alpha} = \alpha^{-1}\left(\beta \alpha^{-\beta}|\mu-y|^{\beta} - 1\right)$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \beta} = \alpha^{-\beta}|\mu-y|^{\beta} \log(|\mu-y|^{-1}\alpha) + \beta^{-1} + \mathrm{digamma}(\beta^{-1})$

We may factor the following subexpressions to simplify computations:

$t_1 = \alpha^{-1}$ $t_2 = \beta^{-1}$ $t_3 = y - \mu$ $t_4 = |t_3|$ $t_5 = (t_1 t_4)^{\beta}$ $t_6 = \beta t_5$

We can then express the partials as:

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y} = -\frac{t_6}{t_3}$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \mu} = \frac{t_6}{t_3}$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \alpha} = t_1(t_6 -1) = t_1 t_6 - t_1$

$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \beta} = t_5 \log\left(t_1 t_4\right) + t_2 + \mathrm{digamma}(t_2)$

jachymb avatar Mar 01 '25 13:03 jachymb

I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?

wolfram link

SteveBronder avatar Mar 05 '25 19:03 SteveBronder

The formulas above have some mistakes that I noticed later, also using wolfram. The partials in the source code I believe to be correct though. Shoud I elaborate more here in the comments?

Dne st 5. 3. 2025 20:22 uživatel Steve Bronder @.***> napsal:

I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?

wolfram link https://www.wolframalpha.com/input?i2d=true&i=Assuming%5C%2891%29%CE%B1+%3E+0+%26%26+%CE%B2+%3E+0%5C%2844%29+D%5Blog%5C%2840%29Divide%5B%CE%B2%2C2*%CE%B1*gamma%5C%2840%29Divide%5B1%2C%CE%B2%5D%5C%2841%29%5D*exp%5C%2840%29-Power%5B%5C%2840%29Divide%5B%7Cy-%CE%BC%7C%2C%CE%B1%5D%5C%2841%29%2C%CE%B2%5D%5C%2841%29%5C%2841%29%2Cy%5D%5C%2893%29

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMUCQN66BDNHF7YGTQ5N2L2S5FGXAVCNFSM6AAAAABTGSHILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBRHA3DAOBZGY . You are receiving this because you authored the thread.Message ID: @.***> [image: SteveBronder]SteveBronder left a comment (stan-dev/math#3133) https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896

I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?

wolfram link https://www.wolframalpha.com/input?i2d=true&i=Assuming%5C%2891%29%CE%B1+%3E+0+%26%26+%CE%B2+%3E+0%5C%2844%29+D%5Blog%5C%2840%29Divide%5B%CE%B2%2C2*%CE%B1*gamma%5C%2840%29Divide%5B1%2C%CE%B2%5D%5C%2841%29%5D*exp%5C%2840%29-Power%5B%5C%2840%29Divide%5B%7Cy-%CE%BC%7C%2C%CE%B1%5D%5C%2841%29%2C%CE%B2%5D%5C%2841%29%5C%2841%29%2Cy%5D%5C%2893%29

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMUCQN66BDNHF7YGTQ5N2L2S5FGXAVCNFSM6AAAAABTGSHILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBRHA3DAOBZGY . You are receiving this because you authored the thread.Message ID: @.***>

jachymb avatar Mar 05 '25 20:03 jachymb

I posted the corrected formulas in the #3157 PR commentary

jachymb avatar Mar 10 '25 10:03 jachymb