Implement the generalized normal distribution
Description
Implement Generalized normal distribution https://en.wikipedia.org/wiki/Generalized_normal_distribution
Why this is useful?
It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.
These correspond to the $L_p$ norms when used for regularization and in other cases.
For examples:
y ~ normal(X*beta, sigma);
produces the criterion minimize $L_2$ norm of (X*beta-y) (a.k.a. least squares). But then
y ~ double_exponential(X*beta, sigma);
produces the criterion minimize $L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)
Similarly, in the Bayesian interpretation of ridge and LASSO,
beta ~ normal(0, lambda);
y ~ normal(X*beta, sigma);
produces the $L_2$ regularized ridge and
beta ~ double_exponential(0, lambda);
y ~ normal(X*beta, sigma);
produces the $L_1$ regularized LASSO.
Using the Generalized normal distribution would allow to conveniently use an arbitrary $L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.
Just some notes on the corresponding formulas:
We have the density:
$f(y|\mu,\alpha, \beta) = \frac{\beta}{2\alpha \Gamma(\beta^{-1})} \exp\left(-\left(\frac{|y-\mu|}{\alpha}\right)^{\beta}\right)$
The log-density:
$\log f(y|\mu,\alpha, \beta) = \log(\beta) - \log\Gamma(\beta^{-1}) - \log(2) - \log(\alpha) -\left(\frac{|y-\mu|}{\alpha}\right)^{\beta}$
And the parital derivatives:
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y} = \beta \alpha^{-\beta}|\mu-y|(\mu-y)^{-1}$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \mu} = \beta \alpha^{-\beta}|\mu-y|^{\beta}(y-\mu)^{-1} = -\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y}$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \alpha} = \alpha^{-1}\left(\beta \alpha^{-\beta}|\mu-y|^{\beta} - 1\right)$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \beta} = \alpha^{-\beta}|\mu-y|^{\beta} \log(|\mu-y|^{-1}\alpha) + \beta^{-1} + \mathrm{digamma}(\beta^{-1})$
We may factor the following subexpressions to simplify computations:
$t_1 = \alpha^{-1}$ $t_2 = \beta^{-1}$ $t_3 = y - \mu$ $t_4 = |t_3|$ $t_5 = (t_1 t_4)^{\beta}$ $t_6 = \beta t_5$
We can then express the partials as:
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial y} = -\frac{t_6}{t_3}$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \mu} = \frac{t_6}{t_3}$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \alpha} = t_1(t_6 -1) = t_1 t_6 - t_1$
$\frac{\partial \log f(y|\mu,\alpha, \beta)}{\partial \beta} = t_5 \log\left(t_1 t_4\right) + t_2 + \mathrm{digamma}(t_2)$
I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?
The formulas above have some mistakes that I noticed later, also using wolfram. The partials in the source code I believe to be correct though. Shoud I elaborate more here in the comments?
Dne st 5. 3. 2025 20:22 uživatel Steve Bronder @.***> napsal:
I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?
— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMUCQN66BDNHF7YGTQ5N2L2S5FGXAVCNFSM6AAAAABTGSHILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBRHA3DAOBZGY . You are receiving this because you authored the thread.Message ID: @.***> [image: SteveBronder]SteveBronder left a comment (stan-dev/math#3133) https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896
I'm putting this into wolfram alpha and getting different gradient results. Can you double check these results?
— Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/3133#issuecomment-2701860896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMUCQN66BDNHF7YGTQ5N2L2S5FGXAVCNFSM6AAAAABTGSHILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBRHA3DAOBZGY . You are receiving this because you authored the thread.Message ID: @.***>
I posted the corrected formulas in the #3157 PR commentary