pytorch-forecasting val_loss is not available

PyTorch-Forecasting version: 0.9.0
PyTorch version: 1.9
Python version: 3.9.6
Operating System: 5.13.13.1-Manjaro (64-bit)

Expected behavior

Training follows through.

Actual behavior

I get this error during training, loss seems to be NaN.

MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss which is not available. Available metrics are: []. Condition can be set using `monitor` key in lr scheduler dict

Code to reproduce the problem

I have the following dataset, lots of series with around 75months of data for each series, they have the same number of steps. All those series represent units sold, there's lots of zeros in this dataset but even removing the series with zeros leads to the same problem. The time_idx goes up every 5000 series or so, since that's the number of individual series per month. Also, if I remove the target normalizer, it complains that there's an unknown category, even though the series in question (2551) is a regular series with as many data points as any other.

df.head(20)

    sales  group  time_idx
0       0      0         0
1       0      1         0
2       0      2         0
3       0      3         0
4       0      4         0
5       0      5         0
6     126      6         0
7      31      7         0
8      41      8         0
9      42      9         0
10      0     10         0
11     37     11         0
12     37     12         0
13      0     13         0
14      5     14         0
15      0     15         0
16      0     16         0
17     74     17         0
18     32     18         0
19     14     19         0

Here's a sample of the dataframe.


df.sample(20, random_state=11)

        sales  group  time_idx
221686      0   3366        40
423094      5   2828        77
259617      0   3091        47
326938      0   4916        59
232764      0   3528        42
2168        0   2168         0
333763    148    825        61
115707      0   1089        21
167575      0   3835        30
284678     33    862        52
8251        0   2793         1
114737      0    119        21
153361    336    537        28
97675       0   4889        17
319041     60   2477        58
411433     95   2083        75
83671       0   1801        15
401391      0   2957        73
420845     47    579        77
98838       2    594        18

The setup is the same as in the tutorial on demand forecasting. I manage to run this example but not with my own dataset. It fails during training every time.


max_prediction_length = 6
max_encoder_length = 36
training_cutoff = dataset["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(dataset[lambda x: x.time_idx <= training_cutoff],
                             time_idx="time_idx",
                             target="sales",
                             group_ids=["group"],
                             min_encoder_length=max_encoder_length // 2,
                             max_encoder_length=max_encoder_length,
                             min_prediction_length=1,
                             max_prediction_length=max_prediction_length,
                             time_varying_unknown_reals=["sales"],
                             target_normalizer=GroupNormalizer(
                                 groups=["group"],
                                 transformation="softplus",
                                 center=False))

# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training,
                                            dataset,
                                            predict=True,
                                            stop_randomization=True)
# create dataloaders for model
batch_size = 128  # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True,
                                          batch_size=batch_size,
                                          num_workers=16)
val_dataloader = validation.to_dataloader(train=False,
                                          batch_size=batch_size,
                                          num_workers=16)

# calculate baseline mean absolute error, i.e. predict next value as the last available value from the history
actuals = torch.cat([y for x, (y, weight) in iter(val_dataloader)])
baseline_predictions = Baseline().predict(val_dataloader)
(actuals - baseline_predictions).abs().mean().item()

# configure network and trainer
early_stop_callback = EarlyStopping(monitor=None,
                                    min_delta=1e-4,
                                    patience=10,
                                    verbose=False,
                                    mode="min")
lr_logger = LearningRateMonitor()  # log the learning rate
logger = TensorBoardLogger(
    "lightning_logs")  # logging results to a tensorboard

trainer = pl.Trainer(
    max_epochs=30,
    gpus=0,
    weights_summary="top",
    gradient_clip_val=0.1,
    limit_train_batches=
    50,  # coment in for training, running valiation every 30 batches
    fast_dev_run=
    False,  # comment in to check that networkor dataset has no serious bugs
    logger=logger,
)

tft = TemporalFusionTransformer.from_dataset(
    training,
    # not meaningful for finding the learning rate but otherwise very important
    learning_rate=1e-6,
    hidden_size=16,  # most important hyperparameter apart from learning rate
    # number of attention heads. Set to up to 4 for large datasets
    attention_head_size=1,
    dropout=0.1,  # between 0.1 and 0.3 are good values
    hidden_continuous_size=8,  # set to <= hidden_size
    output_size=7,  # 7 quantiles by default
    loss=QuantileLoss(),
    # reduce learning rate if no improvement in validation loss after x epochs
    reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# fit network
trainer.fit(
    tft,
    train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
)

# calcualte mean absolute error on validation set
actuals = torch.cat([y[0] for x, y in iter(val_dataloader)])
predictions = trainer.predict(val_dataloader)
(actuals - predictions).abs().mean()

Did I format something wrong? What is going on?

Thanks