darts [Question] Validation metric different when trained model is rerun on validation set

[Question] Validation metric different when trained model is rerun on validation set

Open tniveej opened this issue 7 months ago • 2 comments

Hey guys, I'm facing a problem that's been driving me nuts. I am a beginner so please forgive me if there are any fundamental mistakes here. Any help is appreciated.

I am using the TiDE model to try and do some prediction (regression) on Timeseries'. I believe what I'm trying to do is transfer learning. I have multiple Timeseries' that I want to train the model on and make a prediction on a different set of similar Timeseries'. When I run the training, the model has a MAE ≈ 0.01. However, when I make the model predict the validation sets of each Timeseries from training and manually calculate the MAE its more like MAE ≈ 0.18. The model also struggles to make any proper predictions (as I show near the end).

The nature of my data is as follows :

There is one variable I would like to predict (I actually have two I would like to predict but to simplify, I'm testing it on one first)
There are a total of 2 static covariates for each Timeseries
There are 44 covariates for each time-step of the time series which are input into the model as future covariates as they are known in their entirety when predicting.

Now getting to the code. This is what I've done :

Model parameters:

device = "gpu" if torch.cuda.is_available() else "cpu"

# this setting stops training once the the validation loss has not decreased by more than 1e-5 for 10 epochs
early_stopping_args = {
    "monitor": "val_loss",
    "patience": 10,
    "min_delta": 1e-5,
    "mode": "min",
    "divergence_threshold": 0.8,
    "verbose": True,
}


# PyTorch Lightning Trainer arguments
pl_trainer_kwargs = {
    "max_epochs": 200,
    "accelerator": device,
    "callbacks": [
        EarlyStopping(
            **early_stopping_args,
        )
    ],
    "gradient_clip_val": 1,
}

# learning rate scheduler
lr_scheduler_cls = torch.optim.lr_scheduler.ExponentialLR
lr_scheduler_kwargs = {
    "gamma": 0.999,
}

#
model_args = {
    "input_chunk_length": 10,  # lookback window
    "output_chunk_length": 1,  # forecast/lookahead window
    "pl_trainer_kwargs": pl_trainer_kwargs,
    "lr_scheduler_cls": lr_scheduler_cls,
    "lr_scheduler_kwargs": lr_scheduler_kwargs,
    "likelihood": None,  # use a likelihood for probabilistic forecasts
    "save_checkpoints": True,  # checkpoint to retrieve the best performing model state,
    "force_reset": True,  # If set to True, any previously-existing model with the same name will be reset (all checkpoints will be discarded). Default: False.
    "batch_size": 32,
    "use_static_covariates": True,
    "random_state": 42,
    "hidden_size": 1024,
    "num_encoder_layers": 2,
    "num_decoder_layers": 4,
    "decoder_output_dim": 64,
    "temporal_decoder_hidden": 64,
    "dropout": 0.1,
    "use_layer_norm": True,
    "use_reversible_instance_norm": False,
    "temporal_width_past": 42,
    "temporal_width_future": 43,
}

dataloader_args = {
    "drop_last": True,
}

Then I create the timeseries from a csv and fill in the missing values using forward fill manually before turning them into a list of timeseries. I then normalized the data using the default MinMax scaler. Here is an example of a timeseries plotted. timeseries to predict :

covariates for said Timeseries :

Next I find a learning rate. The reason it's a function that iterates until it doesn't fail is because I reused the code from where I did hyperparameter tuning to find the optimal model parameters and I didn't want the trial to fail because it couldn't find a suitable learning rate:

model_tide = TiDEModel(
    **model_args,
    # log_tensorboard = True,
    model_name="Tide_best",
    loss_fn=torch.nn.MSELoss(),
)


def find_lr(model):
    max_lr = 0.1
    while True:
        try:
            lr_results = model.lr_find(
                series=train,
                future_covariates=train_cov,
                val_series=val,
                val_future_covariates=val_cov,
                dataloader_kwargs={
                    "drop_last": True,
                },
                max_samples_per_ts=200,
                min_lr=1e-08,
                max_lr=max_lr,
                verbose=True,
            )
            return lr_results.suggestion()

        except Exception:
            print("lr too big")
            max_lr = max_lr / 10


best_lr = find_lr(model_tide)
print(best_lr)

Next I go ahead to the training :

torch_metrics = MetricCollection(
    [MeanSquaredError(), MeanAbsoluteError(), MeanAbsolutePercentageError()]
)

pl_trainer_kwargs["callbacks"] = [
    EarlyStopping(
        **early_stopping_args,
    )
]

model_tide = TiDEModel(
    **model_args,
    model_name="Tide_best",
    log_tensorboard=True,
    loss_fn=torch.nn.MSELoss(),
    torch_metrics=torch_metrics,
    optimizer_kwargs={"lr": best_lr, "weight_decay": 0},
)

model_tide.fit(
    series=train,
    future_covariates=train_cov,
    val_series=val,
    val_future_covariates=val_cov,
    dataloader_kwargs=dataloader_args,
    # max_samples_per_ts = 150,
    verbose=True,
)

Thre training is whacky because it seems like there is no decrease in loss, just a fluctuation. Also, for some reason it always stops at the number of epochs set by the patience parameter of the EarlyStopping callback; 10 epoch in this case. But I think we can ignore that for now (?) Here's are pictures showing the training loss and the validation loss + MAE

Because I noticed even with a low MAE, the model is not able to make any meaningful predictions, I went ahead and tried to do the prediction on the validation sets used doing training on the trained model :

mae_list = []
points = 0
loaded_model = TiDEModel.load_from_checkpoint(
    model_name="Tide_best", best=True, log_tensorboard=False
)

for i, val_ts in enumerate(tqdm(val)):
    # provied all the covariates including the ones from the training set
    cov = train_cov[i].append(val_cov[i])
    len_pred = len(val_ts)

    predictions = loaded_model.predict(
        #using the training set as past values and predicting the validation set
        n=len_pred, series=(train[i]), future_covariates=cov, verbose=False
    )

    mean_abs_err = mae(val_ts, predictions, intersect=True) * (len_pred)
    points = points + len_pred

    mae_list.append(mean_abs_err)

    # some random timeseries to visualize
    if i == 180:
        # print(predictions.values())
        predictions.plot(label="Predictions")
        val_ts.plot(label="Actual")


print(f"Mean Absolute Error : {np.sum(mae_list)/points}")

I get MAE = 0.1881085145800395 which is way off the values obtained during training. An example of the prediction made by the model (the same Timeseries from the example dataset shown above)

I've been at this for some time and I still can't figure out what's going wrong. Can someone explain to me what I'm doing wrong here?

Jul 25 '24 15:07 tniveej

darts darts copied to clipboard

[Question] Validation metric different when trained model is rerun on validation set

darts
darts copied to clipboard