pytorch-forecasting
pytorch-forecasting copied to clipboard
val_loss is not available
- PyTorch-Forecasting version: 0.9.0
- PyTorch version: 1.9
- Python version: 3.9.6
- Operating System: 5.13.13.1-Manjaro (64-bit)
Expected behavior
Training follows through.
Actual behavior
I get this error during training, loss seems to be NaN.
MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss which is not available. Available metrics are: []. Condition can be set using `monitor` key in lr scheduler dict
Code to reproduce the problem
I have the following dataset, lots of series with around 75months of data for each series, they have the same number of steps. All those series represent units sold, there's lots of zeros in this dataset but even removing the series with zeros leads to the same problem. The time_idx goes up every 5000 series or so, since that's the number of individual series per month. Also, if I remove the target normalizer, it complains that there's an unknown category, even though the series in question (2551) is a regular series with as many data points as any other.
df.head(20)
sales group time_idx
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 126 6 0
7 31 7 0
8 41 8 0
9 42 9 0
10 0 10 0
11 37 11 0
12 37 12 0
13 0 13 0
14 5 14 0
15 0 15 0
16 0 16 0
17 74 17 0
18 32 18 0
19 14 19 0
Here's a sample of the dataframe.
df.sample(20, random_state=11)
sales group time_idx
221686 0 3366 40
423094 5 2828 77
259617 0 3091 47
326938 0 4916 59
232764 0 3528 42
2168 0 2168 0
333763 148 825 61
115707 0 1089 21
167575 0 3835 30
284678 33 862 52
8251 0 2793 1
114737 0 119 21
153361 336 537 28
97675 0 4889 17
319041 60 2477 58
411433 95 2083 75
83671 0 1801 15
401391 0 2957 73
420845 47 579 77
98838 2 594 18
The setup is the same as in the tutorial on demand forecasting. I manage to run this example but not with my own dataset. It fails during training every time.
max_prediction_length = 6
max_encoder_length = 36
training_cutoff = dataset["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(dataset[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="sales",
group_ids=["group"],
min_encoder_length=max_encoder_length // 2,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
time_varying_unknown_reals=["sales"],
target_normalizer=GroupNormalizer(
groups=["group"],
transformation="softplus",
center=False))
# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training,
dataset,
predict=True,
stop_randomization=True)
# create dataloaders for model
batch_size = 128 # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True,
batch_size=batch_size,
num_workers=16)
val_dataloader = validation.to_dataloader(train=False,
batch_size=batch_size,
num_workers=16)
# calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
actuals = torch.cat([y for x, (y, weight) in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)
(actuals - baseline_predictions).abs().mean().item()
# configure network and trainer
early_stop_callback = EarlyStopping(monitor=None,
min_delta=1e-4,
patience=10,
verbose=False,
mode="min")
lr_logger = LearningRateMonitor() # log the learning rate
logger = TensorBoardLogger(
"lightning_logs") # logging results to a tensorboard
trainer = pl.Trainer(
max_epochs=30,
gpus=0,
weights_summary="top",
gradient_clip_val=0.1,
limit_train_batches=
50, # coment in for training, running valiation every 30 batches
fast_dev_run=
False, # comment in to check that networkor dataset has no serious bugs
logger=logger,
)
tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate=1e-6,
hidden_size=16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=1,
dropout=0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size=8, # set to <= hidden_size
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
# fit network
trainer.fit(
tft,
train_dataloaders=train_dataloader,
val_dataloaders=val_dataloader,
)
# calcualte mean absolute error on validation set
actuals = torch.cat([y[0] for x, y in iter(val_dataloader)])
predictions = trainer.predict(val_dataloader)
(actuals - predictions).abs().mean()
Did I format something wrong? What is going on?
Thanks
I have the same problem. Did you find any solution?
No sorry I never got it to work
I have the same problem! Did anyone find a solution in the meantime?
I have the same problem when time_varying_known_categoricals=[].
the problem doesn't appear when I use some time_varying_known_categoricals