skpro icon indicating copy to clipboard operation
skpro copied to clipboard

[ENH] further work on `GLMRegressor` interfacing `statsmodels` `GLM`

Open fkiraly opened this issue 1 year ago • 3 comments

From #222, remaining work items to interface statsmodels GLM:

  • currently, only gaussian family is implemented. Further families should be interfaced from statsmodels, in particular Gamma and Tweedie.
  • we should try to cover as many parameters as we can in get_test_params, currently coverage is low.
  • the docstring should say, for every parameter, what possible values are. E.g., what are possible values for cov_type, method, what are expected sizes for start_params, etc.
  • some of the parameters of statsmodels GLM are not exposed to the user, as they require array-like input which is unavailable in predict, e.g., offset, exposure. It should be investigated how these could be interfaced, with a sensible treatment in predict where applicable.

fkiraly avatar Mar 31 '24 20:03 fkiraly

FYI @julian-fong, I've moved the work items to here, just to keep track.

fkiraly avatar Mar 31 '24 20:03 fkiraly

Thank you Franz, I'll look into adding these features when i get a spare moment (working on my GSoC proposal!). I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?

julian-fong avatar Apr 01 '24 01:04 julian-fong

I'm not too familiar with the tweedie distribution, does it require to have its own probability distribution on skpro as well?

Yes, although it does not seem too easy - it is not available in scipy or tensorflow_proba, and it is quite tedious to implement.

sklearn has a TweedieRegressor, which is GLM with Tweedie, but it does not return distribution parameters or a distribution, so it's strange why sklearn would have that in the first place.

Might be too much work for too little benefit. Anyway, gamma is a special case, which is more straightforward.

fkiraly avatar Apr 01 '24 11:04 fkiraly