darts
darts copied to clipboard
[INFO] Can I use a pipeline from sklearn that makes usage of a model and feature selection?
What I want Hi there!
I want to implement Recursive feature elimination for my sklearn models. This is the scikit-learn API.
Can this be a pipeline from scikit-learn and pass it into Darts?
Hi @guilhermeparreira,
This should be possible with the following consideration:
- regression model must be created with
output_chunk_length=1
(then obtain the underlying model stored in themodel
attribute of darts regression model to pass asestimator
) - tabularized data created inside
RegressionModel._fit_model()
using theself._create_lagged_data()
method must be used asX
andy
arrays
The output of rfe
will be a bit difficult to interpret because features are lags, so make sure to link them back to the lags used to create the Darts model.
Thank you for the answer!
So, I can only use with output_chunk_length=1
, right?
Do you have one example of the steps you mentioned in bullet two?
Actually, you could probably also use output_chunk_length > 1
in combination with multi_models=False
in order to have only one model but keep in mind that the lags will be shifted (with respected to the corresponding position in the forecasted horizon, see regression model example notebook). I am going to cover the most simple scenario (no covariates) in my example:
import numpy as np
from sklearn.feature_selection import RFE
from darts.models import LinearRegressionModel
from darts.datasets import AirPassengersDataset
import darts.utils.timeseries_generation as tg
ts = AirPassengersDataset().load()
model = LinearRegressionModel(lags=12, output_chunk_length=1)
X, y = model._create_lagged_data(target_series=ts, past_covariates=None, future_covariates=None, max_samples_per_ts=None)
rfe = RFE(estimator=model.model, n_features_to_select=3, step=1)
rfe.fit(X, y)
model_lags = model._get_lags('target')
# the best lags are -12, -2 and -1, matching expectations since there is a strong yearly seasonality
best_lags = [model_lags[idx] for idx in np.where(rfe.ranking_ == 1)[0]]
Note that if you use covariates, you would need to concatenate the lags when creating the model_lags
variable. I will try to add this example and others to the RegressionModel example notebook as it might be useful for other users.