[ENH] Tweedie distribution support in GAM
Description This pull request implements the Tweedie distribution in the GAM package. The Tweedie distribution is critical for modeling data that exhibit characteristics of both continuous and discrete components, such as insurance claims or zero-inflated data. This addition enhances the package's flexibility in handling real-world datasets with mixed-type responses.
Key Changes
- Added Tweedie distribution support.
- Included associated tests to ensure robustness.
- Updated documentation and examples for ease of use.
Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.
Could you site the literature you used to implement the Tweedie distribution? Seems very interesting.
-
Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall.
- This book provides an in-depth exploration of dispersion models, including the Tweedie family, discussing their theoretical foundations and practical applications.
-
Gilchrist, R., & Drinkwater, D. (2000). The use of the Tweedie distribution in statistical modelling. In COMPSTAT (pp. 313–318). Physica, Heidelberg.
- This paper focuses on parameter estimation for Tweedie distributions, particularly the compound Poisson (1 < p < 2) and stable form (p > 2) cases, and demonstrates their application in modeling data with zero observations and large dispersion.
-
Dunn, P. K., & Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing, 15(4), 267–280.
- This article presents methods for evaluating the densities of Tweedie exponential dispersion models, which is crucial for implementing these distributions in statistical software.
-
Smyth, G. K., & Jørgensen, B. (2002). Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin: The Journal of the IAA, 32(1), 143–157.
- This paper discusses fitting Tweedie models to insurance claims data, highlighting the practical implementation of these models in actuarial science.
can I get a review for this PR?
FYI @nickcorona, sorry for the long delay (handover/maintenance period which is now over)
@fkiraly
instead, I would suggest to allow for distribution objects to be passed in addition to strings, and "tweedie" to translate to TweedieDist
This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108
with some sensible default for power.
This is the critical aspect, since we dont have a method for estimating the power parameter
that distribution objects are also allowed (e.g., Tweedie)
Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108. Is that sufficient?
This should be already fine, since the GAM class accepts both distribution strings or instantiaited distribution objects https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108
Thanks for the pointer!
Our docstrings already document that: https://github.com/dswah/pyGAM/blob/main/pygam/pygam.py#L108. Is that sufficient?
I would say: no. The docstring should either list the possible distribution strings that can be passed, and the classes that can be passed, or link to a list thereof. Otherwise, the user has to start searching if they want to understand how they can use the distribution parameter, if they start at the docstring.
I would say, string options should be listed, and a link to a page with the distributions should also be provided.