mlforecast
mlforecast copied to clipboard
Scalable machine 🤖 learning for time series forecasting.
mlforecast
Install
PyPI
pip install mlforecast
If you want to perform distributed training, you can instead use
pip install mlforecast[distributed]
, which will also install
dask. Note that you’ll also need to install either
LightGBM
or
XGBoost.
conda-forge
conda install -c conda-forge mlforecast
Note that this installation comes with the required dependencies for the
local interface. If you want to perform distributed training, you must
install dask (conda install -c conda-forge dask
) and either
LightGBM
or
XGBoost.
How to use
The following provides a very basic overview, for a more detailed description see the documentation.
Store your time series in a pandas dataframe with an index named unique_id that identifies each time serie, a column ds that contains the datestamps and a column y with the values.
from mlforecast.utils import generate_daily_series
series = generate_daily_series(20)
series.head()
ds | y | |
---|---|---|
unique_id | ||
id_00 | 2000-01-01 | 0.264447 |
id_00 | 2000-01-02 | 1.284022 |
id_00 | 2000-01-03 | 2.462798 |
id_00 | 2000-01-04 | 3.035518 |
id_00 | 2000-01-05 | 4.043565 |
Then create a TimeSeries
object with the features that you want to
use. These include lags, transformations on the lags and date features.
The lag transformations are defined as numba
jitted functions that transform an array, if they have additional
arguments you supply a tuple (transform_func
, arg1
, arg2
, …).
from mlforecast import Forecast, TimeSeries
from window_ops.expanding import expanding_mean
from window_ops.rolling import rolling_mean
ts = TimeSeries(
lags=[7, 14],
lag_transforms={
1: [expanding_mean],
7: [(rolling_mean, 7), (rolling_mean, 14)]
},
date_features=['dayofweek', 'month']
)
ts
TimeSeries(freq=<Day>, transforms=['lag-7', 'lag-14', 'expanding_mean_lag-1', 'rolling_mean...window_size-7', 'rolling_mean...indow_size-14'], date_features=['dayofweek', 'month'], num_threads=1)
Next define your models. If you want to use the local interface this can
be any regressor that follows the scikit-learn API. For distributed
training there are LGBMForecast
and XGBForecast
.
import lightgbm as lgb
import xgboost as xgb
from sklearn.ensemble import RandomForestRegressor
models = [
lgb.LGBMRegressor(),
xgb.XGBRegressor(),
RandomForestRegressor(random_state=0),
]
Now instantiate your forecast object with the models and the time
series. There are two types of forecasters, Forecast
which is local
and DistributedForecast
which performs the whole process in a
distributed way.
fcst = Forecast(models, ts)
To compute the features and train the model using them call .fit
on
your Forecast
object.
fcst.fit(series)
Forecast(models=[LGBMRegressor(), XGBRegressor(...lambda=1, ...), RandomForestR...andom_state=0)], ts=TimeSeries(fr...num_threads=1))
To get the forecasts for the next 14 days call .predict(14)
on the
forecaster. This will automatically handle the updates required by the
features.
predictions = fcst.predict(14)
predictions.head()
ds | LGBMRegressor | XGBRegressor | RandomForestRegressor | |
---|---|---|---|---|
unique_id | ||||
id_00 | 2000-08-10 | 5.226933 | 5.165335 | 5.244840 |
id_00 | 2000-08-11 | 6.222637 | 6.181697 | 6.258609 |
id_00 | 2000-08-12 | 0.212516 | 0.231710 | 0.225484 |
id_00 | 2000-08-13 | 1.236251 | 1.244750 | 1.228957 |
id_00 | 2000-08-14 | 2.241766 | 2.291263 | 2.302455 |