SeasonalNaive forecasts are not as expected; expected lag 12 but forecast is rounded and slightly off
What happened + What you expected to happen
I have pandas dataframe with monthly time series data. I am using the SeasonalNaive model because the data has strong annual seasonality (seasonal_length = 12) and YoY would be a good benchmark/baseline. Instead of coding something myself to merely get the lag 12 values (or shift 12 in pandas) I thought to use SeasonalNaive.
I noticed that the forecast from the SeasonalNaive model is not as I expected. I expected $y_{t+1} = y_{t-12}$. That is, I expected the forecast to be the exact value from 12 months ago. Instead, the forecast is rounded and a different value.
For instance, in the example below the forecast for 2024-01-01 is 6779547100 but I expected 6779547060.561772. It's close - residual ($e = y - \hat{y}$) of -75.438... but not what I expected
Perhaps someone could clarify:
- Whether my expectations/assumptions about SeasonalNaive are wrong; that is, how come the forecast isn't just the exact value from 12 months ago?
- How come the forecast is rounded?
PS. Thanks all for the great on Nixtla
Versions / Dependencies
- statsforecast==1.7.3
- pandas==2.2.1
- numpy==1.23.5
Reproduction script
# import
import pandas as pd
import numpy as np
import os
from datetime import date
from dateutil.relativedelta import relativedelta
from statsforecast.core import StatsForecast
from statsforecast.models import SeasonalNaive
# settings
os.environ['NIXTLA_ID_AS_COL'] = '1'
# parms
h = 12
periods = h*4
# reproducibility
np.random.seed(123)
# create data
dataDict = {
"unique_id": "reprex",
"ds": pd.date_range(start="2021-01-01", periods=periods, freq="MS"),
"y": np.random.uniform(1e9,9e9,periods),
}
df_data = pd.DataFrame(dataDict)
df_train = df_data.loc[df_data['ds'] < "2024-01-01", :]
df_test = df_data.loc[df_data['ds'] >= "2024-01-01", :]
df_train.reset_index(inplace=True, drop=True)
df_test.reset_index(inplace=True, drop=True)
# define model
SNaive = SeasonalNaive(
season_length = 12,
alias = "baseline_yoy"
)
# Instantiate StatsForecast class
fcst = StatsForecast(
models = [SNaive],
freq = 'MS',
n_jobs = 8,
verbose = False,
sort_df = True
)
# forecast
df_forecast = fcst.forecast(
df = df_train,
h = h,
fitted = False,
sort_df = True,
)
# compare to forecast to actual
tmpDateMask = np.isin([c.date() for c in df_train['ds']], [(c - relativedelta(months=12)).date() for c in df_test['ds']])
df_forecast['y'] = np.array(df_train.loc[tmpDateMask,"y"])
df_forecast['residual'] = df_forecast['y'] - df_forecast['baseline_yoy']
# house cleaning
del tmpDateMask
# show
df_forecast
Issue Severity
Low: It annoys or frustrates me.
Hey. This is most likely because we cast the values to float32. I'll check if we can keep the type instead
Hi, @jmoralez . Thanks for the quick reply! Please let me know if I can help w/ anything.
@jmoralez thanks for all the hard work!