math
math copied to clipboard
more glm functions
Feature request: glms for positive continuous distributions
It would be very useful to include _glm_lpdf
functions positive-only data with positively-skewed errors. (See my post here
https://discourse.mc-stan.org/t/trying-to-understand-glm-lp-f-functions-in-stan/16400/5).
In cognitive psychology and psycholinguistics, reaction (and reading) times are usually fit with a lognormal likelihood. So I think that lognormal_glm_lpdf
would be for sure useful for wide audience.
A gamma distribution is also not uncommon (also for ecology, https://seananderson.ca/2014/04/08/gamma-glms/), but here there are several posibilities for a link function, inverse link, no link, log link.
Addition by @avehtari:
We're missing also binomial_logit_glm (requested in Stan discourse. So the current wish list would be
- [ ] lognormal
- [ ] gamma
- [ ] weibull
- [ ] binomial
Is this the intended definition, or if not, could you clarify.
lognormal_glm_lpdf(y, x, alpha, beta, sigma)
= normal_glm_lpdf(log(y), x, alpha, beta, sigma)
= lognormal_lpdf(y | exp(alpha + x * beta), sigma)
Edit: The first one would also need a Jacobian if y is a parameter vector.
I mean this:
lognormal_glm_lpdf(y, x, alpha, beta, sigma)
= normal_glm_lpdf(log(y), x, alpha, beta, sigma) + -log(y) // this is the Jacobian, right?
ha, well, now that I write it I see that it's so simple that maybe it doesn't need its own function... In any case, most reaction times are fitted this way.
The gamma is another story, so my request is less dumb :)
If it's a common glm then I don't see any reason to not have lognormal_glm_lpdf
tho'
Anything that requires a Jacobian is going to be tricky for a lot of our users. Also, I strongly prefer not to have transforms everywhere in code. So I think lognormal_glm_lpdf
would make sense.
I edited the issue title and suggest adding also weibull_log_glm_*
which is very common in survival analysis.
In survival analysis with censored observations, it would be very useful to have in addition of *_glm_lpdf
have corresponding *_glm_lcdf
and *_glm_lccdf
.
Of course, it would be nice to have glm's for almost all distributions, but @rok-cesnovar asked to prioritize.
2.29 release notes say
An example of a simple but powerful optimization with the --O1 flag. The call
target += bernoulli_logit_lpmf(y | mx * beta);
will automatically be replaced with a call to the bernoulli-logit GLM function:target += bernoulli_logit_glm_lpmf(y, mx, 0, beta);
and thus, I'm asking just in case, if that could provide an easier way to get compound speedup without explicitly defining all possible glm functions?
I'm asking just in case, if that could provide an easier way to get compound speedup without explicitly defining all possible glm functions?
Unfortunately not, this just replaces calls to used the GLM C++ function instead of the user having to know there is a GLM function.
Of course, it would be nice to have glm's for almost all distributions, but @rok-cesnovar asked to prioritize.
I agree it would be nice to have more of them or all of them that make sense, especially with the stanc3 optimization this becomes even more useful as the user doesn't have to know which functions have GLM equivalents.
Maybe a good project for google summer of code or something like that? Given that we have 6 GLM functions that could serve as templates, this should not be a very demanding task, mostly replicating code and writing tests.