sktime
sktime copied to clipboard
[ENH] Forecast simulations
Proposing to add a new method in the base forecaster class called simulate
.
This method will return a multiindex m-type containing multiple forecast simulations, similar to the Bootstrapping transformers return type.
This is useful for multiple applications.
- Getting multistep prediction intervals in cases where only the 1-step forecast variance is available. e.g. https://otexts.com/fpp3/ets-forecasting.html
- Intervals of temporal aggregations of forecasts. https://otexts.com/fpp3/aggregates.html
A lot of packages that we have wrappers for already have this functionality:
For the above, we can interface this method directly.
For other models we can use our native predict_proba
and sample from the one-step ahead distribution to create multiple sample paths.
example from a statsmodels protoype:
import numpy as np
from sktime.forecasting.ets import AutoETS
from sktime.datasets import load_airline
from sktime.utils.plotting import plot_series
y = load_airline()
forecaster = AutoETS()
forecaster.fit(y)
y_hat = forecaster.simulate(np.arange(1,12))
display(y_hat)
Hm, question: in other words, this is sampling from the joint predictive distribution, right?
Should it then not be called predict_proba_sample
or similar?
Other question: what should happen if you run this for a hierarchical input? Do we simply add one more level on the left?
One more question: is there a sensible default? E.g., sampling from predict_var
s.t. normal sampling?
predict_proba_sample
sounds like we're sampling from the distribution of each time step as if they where independent.
It's usually a recursive sampling from the 1-step ahead forecast distribution (unless we have a multivariate distribution to sample from, which we practically never have for forecasting models). The reason is that the timepoints are obviously not independent.
I would prefer simulate
or something equivalent.
Simulate is the common term among forecasting packages:
makes sense, if it is a common name for this.
Perhaps it's worth enforcing pandas.DataFrame
based containers in the output, since we probably want to add a hierarchy level for the different simulation runs?
Also, how are we going to handle the various parameters that are specific to the simulation interfaces?
Also, how are we going to handle the various parameters that are specific to the simulation interfaces?
We'll treat it in the same way as predict
and predict_quantiles
!
For example, I think n_simulations
should be passed in the method and not as a class attribute in __init__
.
Other parameters that are specific to an estimator and will just add noise in the signature of the public method for others should go in the constructor. I also think the inevitable parameter n_jobs
should also be passed as a class attribute.
One more question: is there a sensible default? E.g., sampling from
predict_var
s.t. normal sampling?
The sensible default is to sample recursively from the predict_proba
distribution for the 1 step horizon.
There's some details around what you keep fixed and when to refit. In particular, I'm not sure how this automatic model selection type of forecasters deal with this e.g. does auto.arima keep the p, d, q fixed and refits the parameters, or does it go through model selection again to find new optimal p, d, q after the recursive update, or does it not even refit the parameters. Time to decrypt R code I guess 😨:
Perhaps it's worth enforcing
pandas.DataFrame
based containers in the output, since we probably want to add a hierarchy level for the different simulation runs?
Yes, I can't think of a better way.
Makes sense. My I kindly ask you to write a quick STEP?
This is a core interface change, so let's just hammer down the details.
I agree! will make a STEP PR soon.