pyGAM icon indicating copy to clipboard operation
pyGAM copied to clipboard

[ENH] Tweedie distribution support in GAM

Open nickcorona opened this issue 1 year ago • 6 comments

Description This pull request implements the Tweedie distribution in the GAM package. The Tweedie distribution is critical for modeling data that exhibit characteristics of both continuous and discrete components, such as insurance claims or zero-inflated data. This addition enhances the package's flexibility in handling real-world datasets with mixed-type responses.

Key Changes

  1. Added Tweedie distribution support.
  2. Included associated tests to ensure robustness.
  3. Updated documentation and examples for ease of use.

nickcorona avatar Dec 02 '24 22:12 nickcorona

Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.

ouslan avatar Dec 09 '24 11:12 ouslan

Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.

  1. Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall.

    • This book provides an in-depth exploration of dispersion models, including the Tweedie family, discussing their theoretical foundations and practical applications.
  2. Gilchrist, R., & Drinkwater, D. (2000). The use of the Tweedie distribution in statistical modelling. In COMPSTAT (pp. 313–318). Physica, Heidelberg.

    • This paper focuses on parameter estimation for Tweedie distributions, particularly the compound Poisson (1 < p < 2) and stable form (p > 2) cases, and demonstrates their application in modeling data with zero observations and large dispersion.
  3. Dunn, P. K., & Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing, 15(4), 267–280.

    • This article presents methods for evaluating the densities of Tweedie exponential dispersion models, which is crucial for implementing these distributions in statistical software.
  4. Smyth, G. K., & Jørgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin: The Journal of the IAA, 32(1), 143–157.

    • This paper discusses fitting Tweedie models to insurance claims data, highlighting the practical implementation of these models in actuarial science.

nickcorona avatar Dec 11 '24 10:12 nickcorona

can I get a review for this PR?

nickcorona avatar Dec 13 '24 14:12 nickcorona

FYI @nickcorona, sorry for the long delay (handover/maintenance period which is now over)

fkiraly avatar Nov 18 '25 08:11 fkiraly

@fkiraly

instead, I would suggest to allow for distribution objects to be passed in addition to strings, and "tweedie" to translate to TweedieDist

This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108

with some sensible default for power.

This is the critical aspect, since we dont have a method for estimating the power parameter

that distribution objects are also allowed (e.g., Tweedie)

Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108. Is that sufficient?

dswah avatar Nov 20 '25 09:11 dswah

This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108

Thanks for the pointer!

Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108. Is that sufficient?

I would say: no. The docstring should either list the possible distribution strings that can be passed, and the classes that can be passed, or link to a list thereof. Otherwise, the user has to start searching if they want to understand how they can use the distribution parameter, if they start at the docstring.

I would say, string options should be listed, and a link to a page with the distributions should also be provided.

fkiraly avatar Nov 21 '25 07:11 fkiraly