statsforecast icon indicating copy to clipboard operation
statsforecast copied to clipboard

SeasonalNaive forecasts are not as expected; expected lag 12 but forecast is rounded and slightly off

Open SaintRod opened this issue 1 year ago • 2 comments

What happened + What you expected to happen

I have pandas dataframe with monthly time series data. I am using the SeasonalNaive model because the data has strong annual seasonality (seasonal_length = 12) and YoY would be a good benchmark/baseline. Instead of coding something myself to merely get the lag 12 values (or shift 12 in pandas) I thought to use SeasonalNaive.

I noticed that the forecast from the SeasonalNaive model is not as I expected. I expected $y_{t+1} = y_{t-12}$. That is, I expected the forecast to be the exact value from 12 months ago. Instead, the forecast is rounded and a different value.

For instance, in the example below the forecast for 2024-01-01 is 6779547100 but I expected 6779547060.561772. It's close - residual ($e = y - \hat{y}$) of -75.438... but not what I expected

Perhaps someone could clarify:

  • Whether my expectations/assumptions about SeasonalNaive are wrong; that is, how come the forecast isn't just the exact value from 12 months ago?
  • How come the forecast is rounded?

PS. Thanks all for the great on Nixtla

Versions / Dependencies

  • statsforecast==1.7.3
  • pandas==2.2.1
  • numpy==1.23.5

Reproduction script

# import
import pandas as pd
import numpy as np
import os
from datetime import date
from dateutil.relativedelta import relativedelta
from statsforecast.core import StatsForecast
from statsforecast.models import SeasonalNaive

# settings
os.environ['NIXTLA_ID_AS_COL'] = '1'

# parms
h = 12
periods = h*4

# reproducibility
np.random.seed(123)

# create data
dataDict = {
    "unique_id": "reprex",
    "ds": pd.date_range(start="2021-01-01", periods=periods, freq="MS"),
    "y": np.random.uniform(1e9,9e9,periods),
}

df_data = pd.DataFrame(dataDict)
df_train = df_data.loc[df_data['ds'] < "2024-01-01", :]
df_test = df_data.loc[df_data['ds'] >= "2024-01-01", :]

df_train.reset_index(inplace=True, drop=True)
df_test.reset_index(inplace=True, drop=True)

# define model
SNaive = SeasonalNaive(
    season_length = 12,
    alias = "baseline_yoy"
)

# Instantiate StatsForecast class
fcst = StatsForecast(
    models = [SNaive],
    freq = 'MS',
    n_jobs = 8, 
    verbose = False,
    sort_df = True
)

# forecast
df_forecast = fcst.forecast(
        df = df_train,
        h = h,
        fitted = False,
        sort_df = True,
)

# compare to forecast to actual
tmpDateMask = np.isin([c.date() for c in df_train['ds']], [(c - relativedelta(months=12)).date() for c in df_test['ds']])
df_forecast['y'] = np.array(df_train.loc[tmpDateMask,"y"])
df_forecast['residual'] = df_forecast['y'] - df_forecast['baseline_yoy']

# house cleaning
del tmpDateMask 

# show
df_forecast

Issue Severity

Low: It annoys or frustrates me.

SaintRod avatar Mar 29 '24 19:03 SaintRod

Hey. This is most likely because we cast the values to float32. I'll check if we can keep the type instead

jmoralez avatar Mar 29 '24 20:03 jmoralez

Hi, @jmoralez . Thanks for the quick reply! Please let me know if I can help w/ anything.

SaintRod avatar Apr 02 '24 19:04 SaintRod

@jmoralez thanks for all the hard work!

SaintRod avatar Sep 13 '24 00:09 SaintRod