greykite icon indicating copy to clipboard operation
greykite copied to clipboard

Add support for HistGradientBoostingRegressor

Open samuelefiorini opened this issue 1 year ago • 2 comments

I use Greykite to forecast hourly time-series with years of historical data and fit_algorithm=gradient_boosting is very slow.

According to sklearn.ensemble.HistGradientBoostingRegressor

This estimator is much faster than GradientBoostingRegressor for big datasets (n_samples >= 10 000).

have you considered adding support for this estimator? It looks straightforward from here, but I may be wrong.

samuelefiorini avatar Apr 06 '23 11:04 samuelefiorini

Thanks for the suggestion! We haven't planed for this yet, but we now take a note. Will update with you if we have this feature implemented. In the meanwhile please feel free to submit a pull request for this feature change if you need to use that. Thanks!

amyfei2015 avatar Apr 28 '23 21:04 amyfei2015

Thanks, I did some experiments (here) and I've been able to make it run (it's far from being a PR though). In my case (hourly forecast with 2+ years of historical data) HistGradientBoostingRegressor is way faster than GradientBoostingRegressor (around 4x) while it has roughly the same performace in backtest.

However, there are also some points of discussion. For instance: due to its implementation, HistGradientBoostingRegressor does not offer a native feature importance measure. While both GradientBoostingRegressor and RandomForestsRegressor do.

A possible approach would be to rely on something like sklearn.inspection.permutation_importance, but this of course comes with higher computational cost, and it's probably not ideal. Otherwise a dummy empty array may be used, maybe raising some warning to inform the user.

samuelefiorini avatar May 02 '23 13:05 samuelefiorini