mlr3pipelines icon indicating copy to clipboard operation
mlr3pipelines copied to clipboard

PipeOp Timeseries Feature Extractor

Open mb706 opened this issue 3 years ago • 1 comments

input: table with id column (not row id), time column t, feature columns f_i with entries f_i,id,t, and target value t_id,t. The PipeOp generates for each id,t (for which it is possible) a previous window of size w and basically turns the features of the respective id into wide-format: We get columns f'_i,deltat, with deltat ranging from -W to 0, containing for each id and t the values f_i,id,(t-W) .. f_i,id,t.

We could think about also making an offset, but shifting observations relative to target should probably another PO.

We could also think about doing something special with the target, but including that would probably also be another PO.

Resampling should probably always be increasing window, and there should be a PO after the feature extractor that limits window size (if desired), so the extractor has access to the whole history it may need. (Maybe window limiting should be in featur extractor to avoid computational overhead?). Note "window" in this paragraph does not refer to the feature extraction window, but to the resampling window/training set size.

This PO also needs to save some of the training time input so prediction windows at the beginning of the predict set can be built.

mb706 avatar Mar 17 '21 19:03 mb706

if the target of the past is used as feature, then it needs to be present even during prediction for resampling, otherwise it becomes NA. We should probably think about how we effectively avoid information leakage (probably target-copier has minimum lag 1 or so).

mb706 avatar Mar 17 '21 19:03 mb706