darts icon indicating copy to clipboard operation
darts copied to clipboard

[BUG] IndexError when transforming static covariates with OneHotEncoder and parameter drop set

Open konsram opened this issue 4 months ago • 1 comments

Describe the bug When using the StaticCovariatesTransformer with sklearn's OneHotEncoder as the transformer_cat and having the drop parameter specified as "if_binary" or "first", the StaticCovariatesTransformer is not aware of the dropped column. This results in an IndexError when calling StaticCovariatesTransformer.transform().

To Reproduce

import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.dataprocessing.transformers import StaticCovariatesTransformer
from sklearn.preprocessing import OneHotEncoder
covs = ["a", "c", "b"] 
drop = "first"
static_covs = pd.DataFrame(data={"cat": covs})
series = TimeSeries.from_values(
    values=np.random.random((10, 3)),
    columns=["comp1", "comp2", "comp3"],
    static_covariates=static_covs,
)
transformer = StaticCovariatesTransformer(transformer_cat=OneHotEncoder(drop=drop))
series_transformed = transformer.fit_transform(series)

Results in the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\fittable_data_transformer.py", line 338, in fit_transform
    return self.fit(
           ^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\fittable_data_transformer.py", line 304, in transform
    return super().transform(
           ^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 397, in transform
    transformed_data = _parallel_apply(
                       ^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\utils\utils.py", line 259, in _parallel_apply
    returned_data = Parallel(n_jobs=n_jobs)(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\joblib\parallel.py", line 1986, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\joblib\parallel.py", line 1914, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 47, in transform_wrapper
    out = transformer_method(cls, series_proc, params, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\base_data_transformer.py", line 232, in _ts_transform
    return cls.ts_transform(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 357, in ts_transform      
    return StaticCovariatesTransformer._transform_static_covs(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 412, in _transform_static_covs
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\LocalData\Anaconda3-2024-02\envs\tsf_vgl_dev\Lib\site-packages\darts\dataprocessing\transformers\static_covariates_transformer.py", line 458, in _add_back_static_covs
    data[col_name] = vals_cat[:, idx_cat]
                     ~~~~~~~~^^^^^^^^^^^^
IndexError: index 2 is out of bounds for axis 1 with size 2

Expected behavior The StaticCovariatesTransformer class should be aware of the number of ouptut features produced by the underlying tansformer.

System (please complete the following information):

  • Python version: 3.12
  • u8darts: 0.37.1
  • scikit-learn: 1.7.1

konsram avatar Aug 26 '25 09:08 konsram

Thanks for raising this issue @konsram, I could reproduce it on my side.

I added it to our backlog. If we can find a smart way to infer (or check beforehand) the mapping between input and output features then we can fix it :)

dennisbader avatar Sep 01 '25 08:09 dennisbader