darts
darts copied to clipboard
predict or backcasting does not make sense for `NaiveSeasonal` model
Thank you for this great package. I have a question regarding the necessity of retrain=True
in backcasting or more generally the sense of model.predict
.
Example problem
I have a univeriate time series that has high autocorrelation and I want to check how well the NaiveSeasonal(K=1)
works before trying out more complex methods. I want to evaluate the performance of this model on the entire validation dataset. Basically this model just shifts the timeseries one step ahead so it should be very fast.
The code is taken from the quickstart docs:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from darts import TimeSeries
from darts.datasets import AirPassengersDataset
from darts.models import NaiveSeasonal
from darts.metrics import mape
series = AirPassengersDataset().load()
train, val = series.split_before(pd.Timestamp("19580101"))
naive_model = NaiveSeasonal(K=1)
naive_model.fit(train)
naive_forecast = naive_model.predict(n=len(val))
series.plot(label="actual")
naive_forecast.plot(label="naive forecast (K=1)")`
My problems with this is conceptually:
Why does naive_model.predict(len(val))
repeat the last value of the training dataset and not the value that occured "K" steps ago (as is written in the docs). This way it doesn't even make sense as a prediction method. Is there an option to pass predict
the validation set and make this happen?
How to achieve what I want is with historical_forecasts
historical_naive = naive_model.historical_forecasts(
series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True
)
series.plot(label="data")
historical_naive.plot(label="1.5 Jahre Exp. Smoothing")
print("MAPE = {:.2f}%".format(mape(historical_naive, series)))
but this takes a while for larger datasets - which it should not in this case - and passing the option retrain=False
is not possible.
So my questions are:
- Why does predict not actually predict the next
n
timesteps? - Why is it not possible to set
retrain=False
for the most simple methods that don't "train" anyways?
I have the same question. the historical_forecast()
takes a really long time. It will be great if we can have a predicting module that takes validation/test 'x' as the input and find the forecasts based on the trained model
I'm not sure I fully understand the questions.
Why does predict not actually predict the next n timesteps?
It does. If you do print(len(naive_forecast))
in the snippet, you'll see it prints 36
. This predicting the same constant value 36 times.
Why is it not possible to set retrain=False for the most simple methods that don't "train" anyways?
It does "train" although in a very naive way: it stores the K
last values of the training series. We could imagine a mode where this model, once trained on a series, could be backtested on another series without "seeing" the last K
values of that series, but I'm not able to think of a case where this would be valuable (let me know if you see one). The "training" should have close-to-negligible impact on performance when running historical forecasts because it's basically only accessing and storing these K
values. Historical forecasts should still be fast for this model.
I have the same question. the
historical_forecast()
takes a really long time.
Does it take a long time for the naive seasonal model?
It will be great if we can have a predicting module that takes validation/test 'x' as the input and find the forecasts based on the trained model
You can do that for many of the non-trivial models (just specify series
when calling predict()
to specify the new series you want to forecast).
I'm not sure I fully understand the questions.
Why does predict not actually predict the next n timesteps?
It does. If you do
print(len(naive_forecast))
in the snippet, you'll see it prints36
. This predicting the same constant value 36 times.
Ah yes, I understand what you mean. It does repeat the value K steps ago, but that value will always be the same. So the model simply means something else then one would expect in the timeseries context. I think this model might be misspecified; it should be called Last Observation Carried Forward.
So then I don't even mind that NaiveSeasonal.predict()
works the way it does. I would just want an actual baseline model where NewNaiveSeasonal(K=1).predict()
directly results in this actual prediction
historical_naive = naive_model.historical_forecasts(
series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True
)
so one does not have to do the backcasting step.
Using the NaiveSeasonal historical forecasts with n=601 took 4 seconds which is too long in my opinion.
Ah yes, I understand what you mean. It does repeat the value K steps ago, but that value will always be the same. So the model simply means something else then one would expect in the timeseries context. I think this model might be misspecified; it should be called Last Observation Carried Forward.
Well it is the last observation only for K=1
, and for any value of K
it is seasonal, for instance with K=12
would capture (naive) yearly seasonality on monthly data.
So then I don't even mind that
NaiveSeasonal.predict()
works the way it does. I would just want an actual baseline model whereNewNaiveSeasonal(K=1).predict()
directly results in this actual predictionhistorical_naive = naive_model.historical_forecasts( series, start=pd.Timestamp("19580101"), forecast_horizon=1, verbose=True )
so one does not have to do the backcasting step.
That would assume a different kind of model, which would consume future inputs (observations) iteratively, in order to simulate what forecasts would have been obtained historically. Which is exactly what historical_forecasts()
is doing :)
Using the NaiveSeasonal historical forecasts with n=601 took 4 seconds which is too long in my opinion.
There are small overheads in backtesting caused by the repetitive creation of new TimeSeries at each time step, but 4 seconds seems high... I tried this (which requires 600 successive forecasts, each with horizon n=600
):
%%time
backtest = NaiveSeasonal(K=1).historical_forecasts(
series=TimeSeries.from_values(np.random.random(1800)),
start=0.5,
forecast_horizon=600
)
and got
CPU times: user 400 ms, sys: 6.27 ms, total: 406 ms
Wall time: 404 ms