Error message related to step_size in conformal prediction methods
What happened + What you expected to happen
- The bug
When we use a conformal prediction method, it returns an error message if we set the
n_windowstoo high. For example, forhorizon = 52andn_windows = 3, we have
ValueError: Minimum required samples in each serie for the prediction intervals settings are: 157, shortest serie has: 156. Please reduce the number of windows, horizon or remove those series
-
Expected behavior For a
step_sizeequal to 1, we would expect this to work, and not get an error message. In fact, the minimum time series length should be 55, not 157. -
Useful information In the method _conformity_scores, we have :
min_size = ufp.counts_by_id(df, id_col)["counts"].min()
min_samples = self.h * self.prediction_intervals.n_windows + 1
if min_size < min_samples:
raise ValueError(
"Minimum required samples in each serie for the prediction intervals "
f"settings are: {min_samples}, shortest serie has: {min_size}. "
"Please reduce the number of windows, horizon or remove those series."
)
self._add_level = True
cv_results = self.cross_validation(
df=df,
static_df=static_df,
n_windows=self.prediction_intervals.n_windows,
id_col=id_col,
time_col=time_col,
target_col=target_col,
)
Thus, the code returns an error message if the time series is shorter than horizon*n_windows+1. This assumes that the step_size used is equal to the horizon, by default.
However, it then uses the cross_validation function, defined as follows:
def cross_validation(
self,
df: Optional[DataFrame] = None,
static_df: Optional[DataFrame] = None,
n_windows: int = 1,
step_size: int = 1,
val_size: Optional[int] = 0,
test_size: Optional[int] = None,
use_init_models: bool = False,
verbose: bool = False,
refit: Union[bool, int] = False,
id_col: str = "unique_id",
time_col: str = "ds",
target_col: str = "y",
prediction_intervals: Optional[PredictionIntervals] = None,
level: Optional[List[Union[int, float]]] = None,
quantiles: Optional[List[float]] = None,
**data_kwargs,
)
By default, the step_size is equal to 1 (and we have no control over this parameter when we call the PredictionIntervals() class). So there should be no restrictions of this type when using conformal prediction methods.
Maybe we should have this restriction :
min_samples = self.h + step_size + 1
Instead of :
min_samples = self.h * self.prediction_intervals.n_windows + 1
Versions / Dependencies
neuralforecast == 3.0.0
Reproduction script
import numpy as np
import pandas as pd
from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS
from neuralforecast.utils import PredictionIntervals
from neuralforecast.losses.pytorch import DistributionLoss, MAE
from datasetsforecast.m4 import M4
df,X_df,S_df=M4().load(directory='data', group='Weekly', cache=True)
df["ds"] = pd.to_numeric(df["ds"])
horizon = 12
input_size = 24
prediction_intervals = PredictionIntervals(method="conformal_error", n_windows = 8)
models = [
NHITS(
h=horizon,
input_size=input_size,
max_steps=100,
loss=MAE(),
scaler_type="robust",
)
]
nf = NeuralForecast(models=models, freq=1)
nf.fit(df, prediction_intervals=prediction_intervals)
expected_df = nf.make_future_dataframe(df)
preds = nf.predict(futr_df=expected_df, level=[90])
Returns :
ValueError: Minimum required samples in each serie for the prediction intervals settings are: 97, shortest serie has: 93. Please reduce the number of windows, horizon or remove those series.
However, the step_size is set to 1 in the cross_validation method so we do not need 97 samples in each series, but less.
Moreover, I think it would be particularly useful to be able to choose the step_size in conformal prediction methods!
Issue Severity
Low: It annoys or frustrates me.
Thanks for the clear overview of the issue!
The suggested workaround makes sense, although I'm thinking whether it should be:
min_samples = self.h + step_size * self.prediction_intervals.n_windows
assuming we properly safeguard step_size (at least 1) and n_windows (at least 2).
It makes sense to add `step_size' to the PredictionIntervals class too.
Thanks for the answer. Yes you are right, in case we could choose the step_size in the PredictionIntervals class, I think the condition would indeed become min_samples = self.h + step_size * self.prediction_intervals.n_windows.