darts icon indicating copy to clipboard operation
darts copied to clipboard

predict or backcasting does not make sense for `NaiveSeasonal` model

Open georgeblck opened this issue 2 years ago • 5 comments

Thank you for this great package. I have a question regarding the necessity of retrain=True in backcasting or more generally the sense of model.predict.

Example problem I have a univeriate time series that has high autocorrelation and I want to check how well the NaiveSeasonal(K=1) works before trying out more complex methods. I want to evaluate the performance of this model on the entire validation dataset. Basically this model just shifts the timeseries one step ahead so it should be very fast.

The code is taken from the quickstart docs:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from darts import TimeSeries
from darts.datasets import AirPassengersDataset
from darts.models import NaiveSeasonal
from darts.metrics import mape

series = AirPassengersDataset().load()
train, val = series.split_before(pd.Timestamp("19580101"))

naive_model = NaiveSeasonal(K=1)
naive_model.fit(train)
naive_forecast = naive_model.predict(n=len(val))

series.plot(label="actual")
naive_forecast.plot(label="naive forecast (K=1)")`

My problems with this is conceptually: Why does naive_model.predict(len(val)) repeat the last value of the training dataset and not the value that occured "K" steps ago (as is written in the docs). This way it doesn't even make sense as a prediction method. Is there an option to pass predict the validation set and make this happen?

How to achieve what I want is with historical_forecasts

historical_naive = naive_model.historical_forecasts(
    series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True
)

series.plot(label="data")
historical_naive.plot(label="1.5 Jahre Exp. Smoothing")
print("MAPE = {:.2f}%".format(mape(historical_naive, series)))

but this takes a while for larger datasets - which it should not in this case - and passing the option retrain=False is not possible.

So my questions are:

  • Why does predict not actually predict the next n timesteps?
  • Why is it not possible to set retrain=False for the most simple methods that don't "train" anyways?

georgeblck avatar Aug 25 '22 11:08 georgeblck

I have the same question. the historical_forecast() takes a really long time. It will be great if we can have a predicting module that takes validation/test 'x' as the input and find the forecasts based on the trained model

parshinsh avatar Aug 27 '22 20:08 parshinsh

I'm not sure I fully understand the questions.

Why does predict not actually predict the next n timesteps?

It does. If you do print(len(naive_forecast)) in the snippet, you'll see it prints 36. This predicting the same constant value 36 times.

Why is it not possible to set retrain=False for the most simple methods that don't "train" anyways?

It does "train" although in a very naive way: it stores the K last values of the training series. We could imagine a mode where this model, once trained on a series, could be backtested on another series without "seeing" the last K values of that series, but I'm not able to think of a case where this would be valuable (let me know if you see one). The "training" should have close-to-negligible impact on performance when running historical forecasts because it's basically only accessing and storing these K values. Historical forecasts should still be fast for this model.

hrzn avatar Sep 01 '22 15:09 hrzn

I have the same question. the historical_forecast() takes a really long time.

Does it take a long time for the naive seasonal model?

It will be great if we can have a predicting module that takes validation/test 'x' as the input and find the forecasts based on the trained model

You can do that for many of the non-trivial models (just specify series when calling predict() to specify the new series you want to forecast).

hrzn avatar Sep 01 '22 15:09 hrzn

I'm not sure I fully understand the questions.

Why does predict not actually predict the next n timesteps?

It does. If you do print(len(naive_forecast)) in the snippet, you'll see it prints 36. This predicting the same constant value 36 times.

Ah yes, I understand what you mean. It does repeat the value K steps ago, but that value will always be the same. So the model simply means something else then one would expect in the timeseries context. I think this model might be misspecified; it should be called Last Observation Carried Forward.

So then I don't even mind that NaiveSeasonal.predict() works the way it does. I would just want an actual baseline model where NewNaiveSeasonal(K=1).predict() directly results in this actual prediction

historical_naive = naive_model.historical_forecasts(
    series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True
)

so one does not have to do the backcasting step.

Using the NaiveSeasonal historical forecasts with n=601 took 4 seconds which is too long in my opinion.

georgeblck avatar Sep 01 '22 15:09 georgeblck

Ah yes, I understand what you mean. It does repeat the value K steps ago, but that value will always be the same. So the model simply means something else then one would expect in the timeseries context. I think this model might be misspecified; it should be called Last Observation Carried Forward.

Well it is the last observation only for K=1, and for any value of K it is seasonal, for instance with K=12 would capture (naive) yearly seasonality on monthly data.

So then I don't even mind that NaiveSeasonal.predict() works the way it does. I would just want an actual baseline model where NewNaiveSeasonal(K=1).predict() directly results in this actual prediction

historical_naive = naive_model.historical_forecasts(
    series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True
)

so one does not have to do the backcasting step.

That would assume a different kind of model, which would consume future inputs (observations) iteratively, in order to simulate what forecasts would have been obtained historically. Which is exactly what historical_forecasts() is doing :)

Using the NaiveSeasonal historical forecasts with n=601 took 4 seconds which is too long in my opinion.

There are small overheads in backtesting caused by the repetitive creation of new TimeSeries at each time step, but 4 seconds seems high... I tried this (which requires 600 successive forecasts, each with horizon n=600):

%%time
backtest = NaiveSeasonal(K=1).historical_forecasts(
    series=TimeSeries.from_values(np.random.random(1800)),
    start=0.5,
    forecast_horizon=600
)

and got

CPU times: user 400 ms, sys: 6.27 ms, total: 406 ms
Wall time: 404 ms

hrzn avatar Sep 02 '22 07:09 hrzn