math icon indicating copy to clipboard operation
math copied to clipboard

more glm functions

Open bnicenboim opened this issue 4 years ago • 7 comments

Feature request: glms for positive continuous distributions

It would be very useful to include _glm_lpdf functions positive-only data with positively-skewed errors. (See my post here https://discourse.mc-stan.org/t/trying-to-understand-glm-lp-f-functions-in-stan/16400/5).

In cognitive psychology and psycholinguistics, reaction (and reading) times are usually fit with a lognormal likelihood. So I think that lognormal_glm_lpdf would be for sure useful for wide audience. A gamma distribution is also not uncommon (also for ecology, https://seananderson.ca/2014/04/08/gamma-glms/), but here there are several posibilities for a link function, inverse link, no link, log link.

Addition by @avehtari:

We're missing also binomial_logit_glm (requested in Stan discourse. So the current wish list would be

  • [ ] lognormal
  • [ ] gamma
  • [ ] weibull
  • [ ] binomial

bnicenboim avatar Jul 06 '20 09:07 bnicenboim

Is this the intended definition, or if not, could you clarify.

lognormal_glm_lpdf(y, x, alpha, beta, sigma)
  = normal_glm_lpdf(log(y), x, alpha, beta, sigma)
  = lognormal_lpdf(y | exp(alpha + x * beta), sigma)

Edit: The first one would also need a Jacobian if y is a parameter vector.

bob-carpenter avatar Jul 06 '20 16:07 bob-carpenter

I mean this:

lognormal_glm_lpdf(y, x, alpha, beta, sigma)
  = normal_glm_lpdf(log(y), x, alpha, beta, sigma) + -log(y) // this is the Jacobian, right?

ha, well, now that I write it I see that it's so simple that maybe it doesn't need its own function... In any case, most reaction times are fitted this way.

The gamma is another story, so my request is less dumb :)

bnicenboim avatar Jul 06 '20 19:07 bnicenboim

If it's a common glm then I don't see any reason to not have lognormal_glm_lpdf tho'

SteveBronder avatar Jul 06 '20 21:07 SteveBronder

Anything that requires a Jacobian is going to be tricky for a lot of our users. Also, I strongly prefer not to have transforms everywhere in code. So I think lognormal_glm_lpdf would make sense.

bob-carpenter avatar Jul 07 '20 18:07 bob-carpenter

I edited the issue title and suggest adding also weibull_log_glm_* which is very common in survival analysis. In survival analysis with censored observations, it would be very useful to have in addition of *_glm_lpdf have corresponding *_glm_lcdf and *_glm_lccdf.

Of course, it would be nice to have glm's for almost all distributions, but @rok-cesnovar asked to prioritize.

2.29 release notes say

An example of a simple but powerful optimization with the --O1 flag. The call target += bernoulli_logit_lpmf(y | mx * beta); will automatically be replaced with a call to the bernoulli-logit GLM function: target += bernoulli_logit_glm_lpmf(y, mx, 0, beta); and thus, I'm asking just in case, if that could provide an easier way to get compound speedup without explicitly defining all possible glm functions?

avehtari avatar Mar 16 '22 09:03 avehtari

I'm asking just in case, if that could provide an easier way to get compound speedup without explicitly defining all possible glm functions?

Unfortunately not, this just replaces calls to used the GLM C++ function instead of the user having to know there is a GLM function.

Of course, it would be nice to have glm's for almost all distributions, but @rok-cesnovar asked to prioritize.

I agree it would be nice to have more of them or all of them that make sense, especially with the stanc3 optimization this becomes even more useful as the user doesn't have to know which functions have GLM equivalents.

Maybe a good project for google summer of code or something like that? Given that we have 6 GLM functions that could serve as templates, this should not be a very demanding task, mostly replicating code and writing tests.

rok-cesnovar avatar Mar 16 '22 09:03 rok-cesnovar