skforecast icon indicating copy to clipboard operation
skforecast copied to clipboard

Feature request: Allow the training set to be passed to custom error metrics

Open KishManani opened this issue 11 months ago • 4 comments

Error metrics such as the MASE and RMSSE require computing scaling factors from the target variable in the training set. Currently there is no way to compute these metrics. It would be nice functionality to allow the computation of these or for these metrics to be included by default.

KishManani avatar Mar 04 '24 08:03 KishManani

Hello @KishManani

Thanks for opening the issue! It is a great point to give the users more flexibility.

I wonder if you will be able to do this using a ForecasterEquivalentDate + a backtesting with a custom metric 🤔 (But it is true that this is more labor-intensive for the user)

What do you think @JoaquinAmatRodrigo ?

https://skforecast.org/latest/user_guides/forecasting-baseline

https://skforecast.org/latest/user_guides/backtesting#backtesting-with-custom-metric

JavierEscobarOrtiz avatar Mar 04 '24 09:03 JavierEscobarOrtiz

Hi @JavierEscobarOrtiz !

I wonder if you will be able to do this using a ForecasterEquivalentDate + a backtesting with a custom metric 🤔 (But it is true that this is more labor-intensive for the user)

I agree that this would be very laborious for a user for what is a relatively common error metric.

I also want to add that it would be nice to compute error metrics which are pooled over multiple time series. For example the normalised deviation and normalised RMSE. See the definitions here:

https://arxiv.org/pdf/1704.04110.pdf image

These metrics are recommended in this review paper: https://link.springer.com/article/10.1007/s10618-022-00894-5

So it would be nice to be able to compute them.

Thank you! Kishan

KishManani avatar Mar 05 '24 09:03 KishManani

Hi @KishManani We are planning to provide these features in the next release. For that, we will probably generalize the calculation of the metrics to use a function that takes 4 arguments: y_pred, y_real (mandatory) and y_pred_train, y_real_train (optional).

There is one corner case that we would like to discuss further. When using backtesting with refit strategy, let's say n refits, there are n groups od of in-sample predictions (some of them may overlap depending on the refit strategy). Do you see any inconvenience in pooling them all?

For multi-series models, we already pool the metric across all selected series using a weighted average, where the weight is the length of the predicted values for each series.

JoaquinAmatRodrigo avatar Jun 18 '24 08:06 JoaquinAmatRodrigo

Hi @JoaquinAmatRodrigo!

We are planning to provide these features in the next release. For that, we will probably generalize the calculation of the metrics to use a function that takes 4 arguments: y_pred, y_real (mandatory) and y_pred_train, y_real_train (optional).

Sounds good! Just fyi, in my opinion, I think real is not a good term to use here. My suggestion would be to use y_true to make it more similar to sklearn (this reduces some cognitive load on people familiar with sklearn).

There is one corner case that we would like to discuss further. When using backtesting with refit strategy, let's say n refits, there are n groups od of in-sample predictions (some of them may overlap depending on the refit strategy). Do you see any inconvenience in pooling them all?

My understanding is that for each backtest step (not neccesarily each refit as we might be refitting intermittently but could still compute the error metric for those folds where we did not refit) we are going to compute some part of our error metrics on the data prior to the forceast horizon (e.g., for MASE we compute the MAE of a 1-step forecasting model -- not the insample errors of the fitted model -- on the training data) and then use this to rescale the metric in the forecast horizon (e.g., divide the MAE in the forecast horizon by the MAE of a 1-step forecasting model on the data prior to the forecast horizon).

I don't understand your question in this context? Could you give a specific example? Thank you!

For multi-series models, we already pool the metric across all selected series using a weighted average, where the weight is the length of the predicted values for each series.

This does not reproduce the NRSME and ND metrics above right?

KishManani avatar Jun 21 '24 09:06 KishManani

Feature included in skforecast version 0.13.0

JoaquinAmatRodrigo avatar Aug 08 '24 08:08 JoaquinAmatRodrigo