pytorch-forecasting
pytorch-forecasting copied to clipboard
Using ``int`` target values instead of ``float`` caused unexpected error: KeyError: "Unknown category '38' encountered. Set `add_nan=True` to allow unknown categories"
- PyTorch-Forecasting version: 0.10.1
- PyTorch version: 1.11.0
- Python version: 3.9.12
- Operating System: Ubuntu 20.04 (WSL 2 on Windows 10)
Expected behaviour
I tried to create a TimeSeriesDataSet and a DataLoader based on a simple DataFrame filled with dummy data as also presented in the tutorial. I expected this to be no big deal. I observed an unexpected error message when using int values instead of float.
Actual behaviour
However, the result was an unexpected error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py:132, in NaNLabelEncoder.transform(self, y, return_norm, target_scale, ignore_na)
[131](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=130) try:
--> [132](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=131) encoded = [self.classes_[v] for v in y]
[133](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=132) except KeyError as e:
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py:132, in <listcomp>(.0)
[131](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=130) try:
--> [132](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=131) encoded = [self.classes_[v] for v in y]
[133](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=132) except KeyError as e:
KeyError: 3[8](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=7)
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb Cell [10](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=9)' in <cell line: 22>()
8 prediction_length = max_prediction_length
10 training = TimeSeriesDataSet(
[11](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=10) data[lambda x: x.time_idx <= training_cutoff],
[12](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=11) time_idx="time_idx",
(...)
[19](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=18) max_prediction_length=prediction_length,
[20](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=19) )
---> [22](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=21) validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1)
[23](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=22) batch_size = 128
[24](vscode-notebook-cell://wsl%2Bubuntu-20.04/home/bergk/geronimos/pytorch-forecasting/docs/source/tutorials/bug-report.ipynb#ch0000033vscode-remote?line=23) train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py:1112, in TimeSeriesDataSet.from_dataset(cls, dataset, data, stop_randomization, predict, **update_kwargs)
[1091](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1090) @classmethod
[1092](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1091) def from_dataset(
[1093](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1092) cls, dataset, data: pd.DataFrame, stop_randomization: bool = False, predict: bool = False, **update_kwargs
[1094](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1093) ):
[1095](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1094) """
[1096](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1095) Generate dataset with different underlying data but same variable encoders and scalers, etc.
[1097](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1096)
(...)
[1110](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1109) TimeSeriesDataSet: new dataset
[1111](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1110) """
-> [1112](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1111) return cls.from_parameters(
[1113](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1112) dataset.get_parameters(), data, stop_randomization=stop_randomization, predict=predict, **update_kwargs
[1114](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1113) )
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py:1158, in TimeSeriesDataSet.from_parameters(cls, parameters, data, stop_randomization, predict, **update_kwargs)
[1155](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1154) parameters["randomize_length"] = None
[1156](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1155) parameters.update(update_kwargs)
-> [1158](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1157) new = cls(data, **parameters)
[1159](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=1158) return new
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py:434, in TimeSeriesDataSet.__init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
[431](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=430) data = data.sort_values(self.group_ids + [self.time_idx])
[433](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=432) # preprocess data
--> [434](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=433) data = self._preprocess_data(data)
[435](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=434) for target in self.target_names:
[436](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=435) assert target not in self.scalers, "Target normalizer is separate and not in scalers."
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py:747, in TimeSeriesDataSet._preprocess_data(self, data)
[744](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=743) data[f"__target__{target}"] = data[target]
[746](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=745) elif isinstance(self.target_normalizer, NaNLabelEncoder):
--> [747](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=746) data[self.target] = self.target_normalizer.transform(data[self.target])
[748](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=747) # overwrite target because it requires encoding (continuous targets should not be normalized)
[749](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/timeseries.py?line=748) data[f"__target__{self.target}"] = data[self.target]
File ~/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py:134, in NaNLabelEncoder.transform(self, y, return_norm, target_scale, ignore_na)
[132](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=131) encoded = [self.classes_[v] for v in y]
[133](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=132) except KeyError as e:
--> [134](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=133) raise KeyError(
[135](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=134) f"Unknown category '{e.args[0]}' encountered. Set `add_nan=True` to allow unknown categories"
[136](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=135) )
[138](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=137) if isinstance(y, torch.Tensor):
[139](file:///home/bergk/geronimos/pytorch-forecasting/pytorch_forecasting/data/encoders.py?line=138) encoded = torch.tensor(encoded, dtype=torch.long, device=y.device)
KeyError: "Unknown category '38' encountered. Set `add_nan=True` to allow unknown categories"
I think it has to do with the data type of the target values because I was using int instead of float.
Code to reproduce the problem
# Compare to: https://pytorch-forecasting.readthedocs.io/en/stable/tutorials/building.html#Passing-data-to-a-model
import numpy as np
import pandas as pd
data = pd.DataFrame(
dict(
# Create integer values instead of float
value=[np.random.randint(100) for i in range(30)], # value=(np.random.rand(30) - 0.5),
group=np.repeat(np.arange(3), 10),
time_idx=np.tile(np.arange(10), 3),
)
)
# Compare to: https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/ar.html
# create dataset and dataloaders
max_encoder_length = 5
max_prediction_length = 2
training_cutoff = data["time_idx"].max() - max_prediction_length
context_length = max_encoder_length
prediction_length = max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="value",
group_ids=["group"],
categorical_encoders={"group": NaNLabelEncoder(add_nan=True).fit(data.group)},
# only unknown variable is "value" - and N-Beats can also not take any additional variables
time_varying_unknown_reals=["value"],
max_encoder_length=context_length,
max_prediction_length=prediction_length,
)
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)
A similar issue has been already discussed here: https://stackoverflow.com/questions/71098518/unknown-category-2-encountered-set-add-nan-true-to-allow-unknown-categories
Here is a colab snippet showing the error: https://colab.research.google.com/drive/1uw-W6SGBLHQF3JQYwpeHS8sPoPY6ZaRP?usp=sharing