[BUG] IndexError when transforming static covariates with OneHotEncoder and parameter drop set
Describe the bug
When using the StaticCovariatesTransformer with sklearn's OneHotEncoder as the transformer_cat and having the drop parameter specified as "if_binary" or "first", the StaticCovariatesTransformer is not aware of the dropped column. This results in an IndexError when calling StaticCovariatesTransformer.transform().
To Reproduce
import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.dataprocessing.transformers import StaticCovariatesTransformer
from sklearn.preprocessing import OneHotEncoder
covs = ["a", "c", "b"]
drop = "first"
static_covs = pd.DataFrame(data={"cat": covs})
series = TimeSeries.from_values(
values=np.random.random((10, 3)),
columns=["comp1", "comp2", "comp3"],
static_covariates=static_covs,
)
transformer = StaticCovariatesTransformer(transformer_cat=OneHotEncoder(drop=drop))
series_transformed = transformer.fit_transform(series)
Results in the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\fittable_data_transformer.py", line 338, in fit_transform
return self.fit(
^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\fittable_data_transformer.py", line 304, in transform
return super().transform(
^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 397, in transform
transformed_data = _parallel_apply(
^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\utils\utils.py", line 259, in _parallel_apply
returned_data = Parallel(n_jobs=n_jobs)(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\joblib\parallel.py", line 1986, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\joblib\parallel.py", line 1914, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 47, in transform_wrapper
out = transformer_method(cls, series_proc, params, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 232, in _ts_transform
return cls.ts_transform(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 357, in ts_transform
return StaticCovariatesTransformer._transform_static_covs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 412, in _transform_static_covs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 458, in _add_back_static_covs
data[col_name] = vals_cat[:, idx_cat]
~~~~~~~~^^^^^^^^^^^^
IndexError: index 2 is out of bounds for axis 1 with size 2
Expected behavior
The StaticCovariatesTransformer class should be aware of the number of ouptut features produced by the underlying tansformer.
System (please complete the following information):
- Python version: 3.12
- u8darts: 0.37.1
- scikit-learn: 1.7.1
Thanks for raising this issue @konsram, I could reproduce it on my side.
I added it to our backlog. If we can find a smart way to infer (or check beforehand) the mapping between input and output features then we can fix it :)