skforecast icon indicating copy to clipboard operation
skforecast copied to clipboard

Feature request: Add ability to skip steps in backtesting

Open KishManani opened this issue 1 year ago • 7 comments

My current understanding is that in all the backtesters the forecast origin moves forward by one time step for each fold of backtesting. I think it would be helpful if users could set the forecast origin to move forward by N steps rather than just one step for each fold. This can help reduce the time for backtesting.

Apologies if this feature is already available, but from reading the docs I'm unsure whether it is.

Thank you! Kishan

KishManani avatar Jan 27 '24 14:01 KishManani

Hello Kishan,

All the strategies available for backtesting are the ones shown in the gifs in the User Guide that you may already be familiar with.

https://skforecast.org/latest/user_guides/backtesting

We have never thought of your alternative. We have indeed a lot of options to configure the refit of the model, but none to avoid predicting folds.

For example, if I understand you correctly, the forecast horizon in your backtesting is 12 weeks. You suggest that users can choose to skip validation for some weeks. Is this correct?

Thanks for opening the issue!

Javi

JavierEscobarOrtiz avatar Jan 27 '24 19:01 JavierEscobarOrtiz

Hi @JavierEscobarOrtiz! Thanks for your reply!

For example, if I understand you correctly, the forecast horizon in your backtesting is 12 weeks. You suggest that users can choose to skip validation for some weeks. Is this correct?

Let me give an example to clarify. Imagine you have the following time series: [1,2,3,4,5,6,7,8,9,10]. I want the initial training size to be 4, the forecast horizon is 2, and I want the forecast origin to move forward by 2 steps during backtesting. So the folds I would have are:

Train: [1,2,3,4], Test: [5, 6] Train: [3,4,5,6], Test: [7, 8] Train: [5,6,7,8], Test: [9, 10]

This reduces the number of folds computed (thereby saving time) compared to moving the forecast origin forward by 1 each step.

Does this make sense?

Thanks again! Kishan

KishManani avatar Jan 27 '24 19:01 KishManani

Hi @KishManani,

Thanks for the clarification, do you mean this strategy?

backtesting fixed train size refit

I think I am missing something in your example. As I see it, you are predicting 2 steps per fold.

Fold 1: Train [1, 2, 3, 4], steps = 2 [5, 6] Fold 2: Train [3, 4, 5, 6], steps = 2 [7, 8] Fold 3: Train [5, 6, 7, 8], steps = 2 [9, 10]

Of course, as you increase the number of steps predicted in each iteration, the number of folds computed will decrease.

Perhaps you are referring to this other strategy that reduces the number of backtesting fits:

backtesting intermittent refit

(it is not shown, but it can also keeps the fixed size of the training set by shifting its origin)

Best,

Javi

JavierEscobarOrtiz avatar Jan 27 '24 21:01 JavierEscobarOrtiz

Hi @JavierEscobarOrtiz,

Thanks for the reply! I think there is a misunderstanding of the example I gave.

I think I am missing something in your example. As I see it, you are predicting 2 steps per fold.

There are two different variables in my example which take the value 2 here. The forecast horizon is 2, we are predicting two steps into the future. The number of steps we move the forecast origin after each fold is also 2. These are two separate things.

The first gif you provided appears to be moving forward by one step after each forecast. Is this correct? If so, I want to be able to specify for it to move forward by N steps to reduce the number of forecasts I make during backtesting.

N = 1 gives: Train: [1,2,3,4], Test: [5, 6] Train: [2,3,4,5], Test: [6, 7] Train: [3,4,5,6], Test: [7, 8] ...

N = 2 would give: Train: [1,2,3,4], Test: [5, 6] Train: [3,4,5,6], Test: [7, 8] Train: [5,6,7,8], Test: [9, 10] ...

N = 3 would give: Train: [1,2,3,4], Test: [5, 6] Train: [4,5,6,7], Test: [8, 9] Train: [7,8,9,10], Test: [11, 12]

Does this clarify?

Thanks again for help! Kishan

KishManani avatar Jan 28 '24 12:01 KishManani

Hi there, If I understand it well, this will be equivalent to skipping some folds (pairs train-test) during backtesting. As a result, the predictions will not cover every time point in the series.

This will reduce the time of the backtesting process and therefore may be useful to speed up the hyperparameters search process, with the disadvantage of having a less exhaustive metric.

@KishManani Do you know of another advantage?

@JavierEscobarOrtiz This could be easily implemented as it only needs to skip some of the splits returned by _create_backtesting_folds.

JoaquinAmatRodrigo avatar Feb 05 '24 10:02 JoaquinAmatRodrigo

Hi @JoaquinAmatRodrigo!

If I understand it well, this will be equivalent to skipping some folds (pairs train-test) during backtesting. As a result, the predictions will not cover every time point in the series.

This will reduce the time of the backtesting process and therefore may be useful to speed up the hyperparameters search process, with the disadvantage of having a less exhaustive metric.

Yes this is correct!

@KishManani Do you know of another advantage?

Not that I know of. Primarily a time saver. There is a short discussion about this in the original Facebook Prophet paper in section 4.3.

Best wishes Kishan

KishManani avatar Feb 07 '24 18:02 KishManani

Hi @KishManani

The easiest way to achieve this kind of behavior would be to skip some folds during the backtesting process. This could be done by adding a new argument skip_folds that takes a list of indices o and integer that would be interpreted as "skip every n folds". With this strategy we would allow the backtest to reduce the number of predictions while still covering most of the backtesting horizon.

Do you see any potential problems with this?

JoaquinAmatRodrigo avatar Jun 18 '24 08:06 JoaquinAmatRodrigo