statsforecast
statsforecast copied to clipboard
Should `forecast_fitted_values` also work for fitted models in addition to when forecast(fitted=True) is called?
Description
Please correct my understanding if it's incorrect. Relatively new to the library!
- The point of
sf.forecast
is to optimize memory burden & be parallel / optimization friendly. - Whereas,
sf.fit
+sf.predict
lets us examine the fitted models closely. - If the user wants to examine in-sample fit, there's a convenience method
sf.forecast_fitted_values()
- However, this only works if
sf.forecast(..., fitted=True)
is called. It doesn't work on models fit usingsf.fit
. - So, if the user would like to examine in-sample fit of models created using
sf.fit
, there are currently two choices, both suboptimal:- Fit the models again (related to https://github.com/Nixtla/statsforecast/issues/639)
- Iterate across all models and call
predict_in_sample
, e.g.,sf.fitted_[0, 0].predict_in_sample()
- requires deeper understanding of architecture + additional step of converting to a dataframe.
My proposal would involve one or both of the following:
- Surface
sf.forecast_fitted_values()
to any StatsForecast object where.fit()
has been called, in addition to ones thatsf.forecast(..., fitted=True)
was called. Unless there's something in the code I missed, implementation would simply be (ii) above. - Add a parameter to .fit() method that does the same thing as
fitted=True
, i.e., stores insample predictions to a"fcst_fitted_values_"
object.
I am happy to work on this if there's interest. Let me know your thoughts.
Use case
The primary reason one would use .fit()
would be to examine the models more closely, including looking at in-sample fit. I think the use case in this issue well-encapsulates the utility of this function.
https://github.com/Nixtla/statsforecast/issues/639#issuecomment-1728345082
Hey @quant5, thanks for the proposal. I've been meaning to do this, I think the first place would be to add a fitted
argument to the models' fit
method, because we currently set it internally to handle the case when the user calls predict_in_sample afterwards, except for models that are too expensive, so we end up with a mix:
https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4143 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4350 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4525 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4722
Once the models have that argument we could pass it through from StatsForecast.fit
and then the forecast_fitted_values
would retrieve them.