glum
glum copied to clipboard
Improve documentation for calculations with Jørgensen (1992) book
This is a really useful resource that I think inspired the setup of the math in sklearn-fork: https://impa.br/wp-content/uploads/2017/04/Mon_51.pdf It's a pile of derivations without a lot of narrative, so it's not a great first introduction to GLMs, but it's really helpful if you need to know where an equation comes from.
- In the terminology of this book (and our software), we are working with exponential dispersion models with location and scale (mu and phi). The book covers all the distributions we're using, including the weird ones like generalized hyperbolic secant, and I think the guy did a lot of original work on the Tweedie distribution (2.7). We should incorporate more of this information into the repo.
- Section 3.3.1 is especially useful as it gives a particular approximation to the likelihood as the limit when variance (dispersion) is low. I believe this is what we use to get the gradient here: https://github.com/Quantco/glm_benchmarks/blob/master/src/quantcore/glm/sklearn_fork/_distribution.py#L384
- For the "Hessian", see the discussion of the FIM as the expected Hessian in 3.4. I think the benefit of setting it up this way is that if you're working with the expected hessian rather than the true hessian, and you're optimizing both the mean and the variance, the mean and variance are orthogonal (meaning that the gradient with respect to the mean doesn't depend on the variance and vice-versa). It's negative semidefinite, which is really helpful and is not in general true of either the true Hessian or a Gauss-Newton approximation to the true Hessian.
- 3.5.2 covers estimation of the dispersion parameter. In the code: https://github.com/Quantco/glm_benchmarks/blob/master/src/quantcore/glm/sklearn_fork/_glm.py#L975
More questions whose answers should be documented:
- What gradient and hessian are we using in the "general" case where we allow an arbitrary link function?
- What about our customized functions for special cases where the link is the default?
- For our special cases, can we safely use the true Hessian since it is positive definite?
Not a priority, but Liz should do this because she read the book.