statsforecast icon indicating copy to clipboard operation
statsforecast copied to clipboard

Should `forecast_fitted_values` also work for fitted models in addition to when forecast(fitted=True) is called?

Open quant5 opened this issue 9 months ago • 1 comments

Description

Please correct my understanding if it's incorrect. Relatively new to the library!

  • The point of sf.forecast is to optimize memory burden & be parallel / optimization friendly.
  • Whereas, sf.fit + sf.predict lets us examine the fitted models closely.
  • If the user wants to examine in-sample fit, there's a convenience method sf.forecast_fitted_values()
  • However, this only works if sf.forecast(..., fitted=True) is called. It doesn't work on models fit using sf.fit.
  • So, if the user would like to examine in-sample fit of models created using sf.fit, there are currently two choices, both suboptimal:
    1. Fit the models again (related to https://github.com/Nixtla/statsforecast/issues/639)
    2. Iterate across all models and call predict_in_sample, e.g., sf.fitted_[0, 0].predict_in_sample() - requires deeper understanding of architecture + additional step of converting to a dataframe.

My proposal would involve one or both of the following:

  • Surface sf.forecast_fitted_values() to any StatsForecast object where .fit() has been called, in addition to ones that sf.forecast(..., fitted=True) was called. Unless there's something in the code I missed, implementation would simply be (ii) above.
  • Add a parameter to .fit() method that does the same thing as fitted=True, i.e., stores insample predictions to a "fcst_fitted_values_" object.

I am happy to work on this if there's interest. Let me know your thoughts.

Use case

The primary reason one would use .fit() would be to examine the models more closely, including looking at in-sample fit. I think the use case in this issue well-encapsulates the utility of this function. https://github.com/Nixtla/statsforecast/issues/639#issuecomment-1728345082

quant5 avatar Apr 30 '24 20:04 quant5

Hey @quant5, thanks for the proposal. I've been meaning to do this, I think the first place would be to add a fitted argument to the models' fit method, because we currently set it internally to handle the case when the user calls predict_in_sample afterwards, except for models that are too expensive, so we end up with a mix:

https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4143 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4350 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4525 https://github.com/Nixtla/statsforecast/blob/e46ece9eaafe80d20a70af58d5fc96edc7737010/statsforecast/models.py#L4722

Once the models have that argument we could pass it through from StatsForecast.fit and then the forecast_fitted_values would retrieve them.

jmoralez avatar Apr 30 '24 20:04 jmoralez