gluonts
gluonts copied to clipboard
Inference Single Item on model trained on Multiple Items
I am using:
- gluonts: latest
- python: 3.11.0
I have a TemporalFusionTransformer
that was trained with a PandasDataset.from_long_dataframe(...)
. In this PandasDataset I have multiple item_ids
|item_id| ...
|-------|
|cat1 |
|cat2 |
|cat3...|
This dataset includes several past_feat_dynamic_reals and a few static_features.
I want to predict on just one category. However when I do something like
df = df.loc[df['item_id'] == 'cat1']
sample_group = PandasDataset.from_long_dataframe(df, **same_dataset_spec_used_for_training)
forecasts = model.predict(dataset = sample_group)
next(iter(forecasts))
I get the following error:
IndexError Traceback (most recent call last)
Cell In[124], line 9
7 model = Pred.deserialize(pathlib.Path(\"./model\"))
8 forecasts = model.predict(dataset = sample_group)
----> 9 next(iter(forecasts))
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/torch/model/predictor.py:90, in PyTorchPredictor.predict(self, dataset, num_samples)
87 self.prediction_net.eval()
89 with torch.no_grad():
---> 90 yield from self.forecast_generator(
91 inference_data_loader=inference_data_loader,
92 prediction_net=self.prediction_net,
93 input_names=self.input_names,
94 output_transform=self.output_transform,
95 num_samples=num_samples,
96 )
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/model/forecast_generator.py:117, in QuantileForecastGenerator.__call__(self, inference_data_loader, prediction_net, input_names, output_transform, num_samples, **kwargs)
108 def __call__(
109 self,
110 inference_data_loader: DataLoader,
(...)
115 **kwargs
116 ) -> Iterator[Forecast]:
--> 117 for batch in inference_data_loader:
118 inputs = select(input_names, batch, ignore_missing=True)
119 outputs = predict_to_numpy(prediction_net, inputs)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:111, in TransformedDataset.__iter__(self)
110 def __iter__(self) -> Iterator[DataEntry]:
--> 111 yield from self.transformation(
112 self.base_dataset, is_train=self.is_train
113 )
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
129 def __call__(
130 self, data_it: Iterable[DataEntry], is_train: bool
131 ) -> Iterator:
--> 132 for data_entry in data_it:
133 try:
134 yield self.map_transform(data_entry.copy(), is_train)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/loader.py:50, in Batch.__call__(self, data, is_train)
49 def __call__(self, data, is_train):
---> 50 yield from batcher(data, self.batch_size)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/itertools.py:131, in batcher.<locals>.get_batch()
130 def get_batch():
--> 131 return list(itertools.islice(it, batch_size))
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
129 def __call__(
130 self, data_it: Iterable[DataEntry], is_train: bool
131 ) -> Iterator:
--> 132 for data_entry in data_it:
133 try:
134 yield self.map_transform(data_entry.copy(), is_train)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:186, in FlatMapTransformation.__call__(self, data_it, is_train)
182 def __call__(
183 self, data_it: Iterable[DataEntry], is_train: bool
184 ) -> Iterator:
185 num_idle_transforms = 0
--> 186 for data_entry in data_it:
187 num_idle_transforms += 1
188 for result in self.flatmap_transform(data_entry.copy(), is_train):
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
129 def __call__(
130 self, data_it: Iterable[DataEntry], is_train: bool
131 ) -> Iterator:
--> 132 for data_entry in data_it:
133 try:
134 yield self.map_transform(data_entry.copy(), is_train)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
129 def __call__(
130 self, data_it: Iterable[DataEntry], is_train: bool
131 ) -> Iterator:
--> 132 for data_entry in data_it:
133 try:
134 yield self.map_transform(data_entry.copy(), is_train)
[... skipping similar frames: MapTransformation.__call__ at line 132 (5 times)]
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
129 def __call__(
130 self, data_it: Iterable[DataEntry], is_train: bool
131 ) -> Iterator:
--> 132 for data_entry in data_it:
133 try:
134 yield self.map_transform(data_entry.copy(), is_train)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:217, in PandasDataset.__iter__(self)
216 def __iter__(self):
--> 217 yield from self._data_entries
218 self.unchecked = True
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:188, in PandasDataset._pair_to_dataentry(self, item_id, df)
179 if not self.unchecked:
180 assert is_uniform(df.index), (
181 \"Dataframe index is not uniformly spaced. \"
182 \"If your dataframe contains data from multiple series in the \"
183 'same column (\"long\" format), consider constructing the '
184 \"dataset with `PandasDataset.from_long_dataframe` instead.\"
185 )
187 entry = {
--> 188 \"start\": df.index[0],
189 }
191 target = df[self.target].values
192 target = target[: len(target) - self.future_length]
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/indexes/base.py:5385, in Index.__getitem__(self, key)
5382 if is_integer(key) or is_float(key):
5383 # GH#44051 exclude bool, which would return a 2d ndarray
5384 key = com.cast_scalar_indexer(key)
-> 5385 return getitem(key)
5387 if isinstance(key, slice):
5388 # This case is separated from the conditional above to avoid
5389 # pessimization com.is_bool_indexer and ndim checks.
5390 return self._getitem_slice(key)
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:379, in DatetimeLikeArrayMixin.__getitem__(self, key)
372 \"\"\"
373 This getitem defers to the underlying array, which by-definition can
374 only handle list-likes, slices, and integer scalars
375 \"\"\"
376 # Use cast as we know we will get back a DatetimeLikeArray or DTScalar,
377 # but skip evaluating the Union at runtime for performance
378 # (see https://github.com/pandas-dev/pandas/pull/44624)
--> 379 result = cast(\"Union[Self, DTScalarOrNaT]\", super().__getitem__(key))
380 if lib.is_scalar(result):
381 return result
File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py:284, in NDArrayBackedExtensionArray.__getitem__(self, key)
278 def __getitem__(
279 self,
280 key: PositionalIndexer2D,
281 ) -> Self | Any:
282 if lib.is_integer(key):
283 # fast-path
--> 284 result = self._ndarray[key]
285 if self.ndim == 1:
286 return self._box_func(result)
IndexError: index 0 is out of bounds for axis 0 with size 0"
Does anyone have any ideas on how one item at a time can be inferenced instead of having to pass multiple items in a dataset at once? The shape of this subset is the exact same as the training shape along with dtypes. Thanks!
Originally posted by @Alex-Wenner-FHR in https://github.com/awslabs/gluonts/discussions/3126
It appears, that when using the same dataset spec with my subset, the other categories are still represented for whatever reason.
for iter in ds_val._data_entries.iterable.iterable:
print(iter)
[0 rows x 24 columns])
('cat2', Empty DataFrame
Columns: [...]
Index: []
[0 rows x 24 columns])
('cat3', Empty DataFrame
Columns: [...]
Index: []
This is less than ideal, but doing something like this allows a single item_id to be inferenced:
iterable: tuple = ds_val._data_entries.iterable.iterable
iterable = [t for t in iterable if len(t[1]) > 1]
ds_val._data_entries.iterable.iterable = tuple(iterable)
@lostella - has anyone from the team been able to lend an eye to this?
@Alex-Wenner-FHR predict
gets a dataset just like train
: if you want to only predict a specific item id, you should be able to construct a PandasDataset
with only a subset of the data, and pass that to predict
. Does that work?
It does not - if you check out the issue a few comments above I put a work around that I was able to implement to get it to work, but natively it does not!