pytorch-forecasting
pytorch-forecasting copied to clipboard
RuntimeError: Sizes of tensors must match except in dimension 1
- PyTorch-Forecasting version: 1.0.0
- PyTorch version: 2.0.1+cpu
- Python version: 3.9
- Operating System: Windows11
Expected behavior
I executed code Baseline().predict(val_dataloader, return_y=True)
and did not expect any errors
Actual behavior
Received the following error
return torch.cat(sequences, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1280 but got size 42 for tensor number 14 in the list.
Code to reproduce the problem
I am running the following code on an internal dataset
max_prediction_length = 6
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data[data['time_idx'] <= training_cutoff],
group_ids=["product_number", "sku_size", "retail_sales_channel"],
time_idx="time_idx",
target="quantity_sold",
min_prediction_length=1,
time_varying_known_reals=["time_idx", "discount_rate"],
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=[
"quantity_physical_closing",
],
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
batch_size = 128 # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
baseline_predictions = Baseline().predict(val_dataloader, return_y=True)
I didn't run the code, but I know len(trainning) % 128 == 42 or len(trainning) % 1280 == 42
Their code is funny.
So do you know what I can change to make it work?
Just make the length of training integer multiple of the batch size.
For example, your batch size is 64. Training length is 6420. Then drop the last 20 samples.
It's the validation data that fails so I assume I should drop it based on validation set? Although I tried both and neither works.
I am currently faced with similar issue even when I tried to evaluate the performance of the tft model.
predictions = best_tft.predict(val_dataloader, return_y=True, trainer_kwargs=dict(accelerator="cpu")) MAE()(predictions.output, predictions.y)
Please, if you find a way around yours, let me know how
I'm having the same issue with pretty much the same code :/
Yes, the code in question (which produces this error) is in the TFT demand example in the documentation.
I've found a fix : modifying the concat_sequences()
function in utils.py
:
it just pads the last sequence tensor with nans so that its size matches that of the other. I'm not sure how reliable this is, but with this my code runs.
def concat_sequences(
sequences: Union[List[torch.Tensor], List[rnn.PackedSequence]]
) -> Union[torch.Tensor, rnn.PackedSequence]:
"""
Concatenate RNN sequences.
Args:
sequences (Union[List[torch.Tensor], List[rnn.PackedSequence]): list of RNN packed sequences or tensors of which
first index are samples and second are timesteps
Returns:
Union[torch.Tensor, rnn.PackedSequence]: concatenated sequence
"""
if isinstance(sequences[0], rnn.PackedSequence):
return rnn.pack_sequence(sequences, enforce_sorted=False)
elif isinstance(sequences[0], torch.Tensor):
# BEGINING OF MODIFIED CODE
#print("Sequence size : ")
#print(sequences[0].size(), sequences[-1].size())
if sequences[0].size(0) > sequences[-1].size(0):
#print("Padding")
delta = sequences[0].size(0) - sequences[-1].size(0)
#print(sequences[-1].size())
sequences[-1] = F.pad(sequences[-1],pad=(0,0,0,delta),mode="constant",value=torch.nan)
#print(sequences[-1].size())
# END OF MODIFIED CODE
return torch.cat(sequences, dim=1)
elif isinstance(sequences[0], (tuple, list)):
return tuple(
concat_sequences([sequences[ii][i] for ii in range(len(sequences))]) for i in range(len(sequences[0]))
)
else:
raise ValueError("Unsupported sequence type")
I've been struggling with a similar problem for a long time now. What worked for me (I don't know if it makes mathematical sense) was to lower the batch size to the size that the error tells you. In your case 42.
Hope this helps
Please see my comment here - https://github.com/jdb78/pytorch-forecasting/issues/449#issuecomment-1649288069.
If you don't need the ys (it's easy to format them yourself), then setting return_y = False
fixes the issue.
@hippotilt thanks! I tracked down the problem to this function. It would be nice if something similar was merged upstream so that we don't need to hack it in our own code.
I encountered the same error and narrowed down the issue, as mentioned by many above, to the concat_sequences function in utils.py. The following fix worked for me:
def concat_sequences(
sequences: Union[List[torch.Tensor], List[rnn.PackedSequence]]
) -> Union[torch.Tensor, rnn.PackedSequence]:
"""
Concatenate RNN sequences.
Args:
sequences (Union[List[torch.Tensor], List[rnn.PackedSequence]): list of RNN packed sequences or tensors of which
first index are samples and second are timesteps
Returns:
Union[torch.Tensor, rnn.PackedSequence]: concatenated sequence
"""
if isinstance(sequences[0], rnn.PackedSequence):
return rnn.pack_sequence(sequences, enforce_sorted=False)
elif isinstance(sequences[0], torch.Tensor):
return torch.cat(sequences, dim=0) # changed from dim=1 to dim=0
elif isinstance(sequences[0], (tuple, list)):
return tuple(
concat_sequences([sequences[ii][i] for ii in range(len(sequences))]) for i in range(len(sequences[0]))
)
else:
raise ValueError("Unsupported sequence type")
Just changing the concat dimension to 0 (the axis containing the batches) fixes the error. I am not sure how this function is used elsewhere in the package and hope it does not break things in those places.
I am currently faced with similar issue even when I tried to evaluate the performance of the tft model.
predictions = best_tft.predict(val_dataloader, return_y=True, trainer_kwargs=dict(accelerator="cpu")) MAE()(predictions.output, predictions.y)
Please, if you find a way around yours, let me know how
Same issue here, can't predict all my examples because they aren't a multiplier of batch_size
. Would be great if we can have a fix on this one.