[ENH] further work on `GLMRegressor` interfacing `statsmodels` `GLM`
From #222, remaining work items to interface statsmodels GLM:
- currently, only gaussian
familyis implemented. Further families should be interfaced fromstatsmodels, in particular Gamma and Tweedie. - we should try to cover as many parameters as we can in
get_test_params, currently coverage is low. - the docstring should say, for every parameter, what possible values are. E.g., what are possible values for
cov_type,method, what are expected sizes forstart_params, etc. - some of the parameters of
statsmodelsGLMare not exposed to the user, as they require array-like input which is unavailable inpredict, e.g.,offset,exposure. It should be investigated how these could be interfaced, with a sensible treatment inpredictwhere applicable.
FYI @julian-fong, I've moved the work items to here, just to keep track.
Thank you Franz, I'll look into adding these features when i get a spare moment (working on my GSoC proposal!). I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?
I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?
Yes, although it does not seem too easy - it is not available in scipy or tensorflow_proba, and it is quite tedious to implement.
sklearn has a TweedieRegressor, which is GLM with Tweedie, but it does not return distribution parameters or a distribution, so it's strange why sklearn would have that in the first place.
Might be too much work for too little benefit. Anyway, gamma is a special case, which is more straightforward.