scalecast
scalecast copied to clipboard
Update export() methods to include CIs more easily + at level=True
This may already be a feature, but i think some of the export methods could be retooled.
Specifically, I would like to pull the CIs for test set predictions and forecasts at the level=True.
export_forecasts_with_cis and export_test set_preds_with_cis don’t currently have a level value to call on according to the docs.
Additionally, it would be nice to be able to call these dfs in the .export() method. This way, I can more easily call the functions for a list of models.
I'll add this in as its in the same vein - can we get plot_fitted to also have a levels argument? plot and plot_test_set already do, so this feels like a weird exclusion.
Not having level fitted values nor confidence intervals has been an intentional decision, based off the difficulty of defining what level fitted values are for models run on non-level data. This is simple enough for approaches like ARIMA, which have the integration level of the data baked into the underlying function, but for a model like KNN or XGBoost, using simple un-differencing of fitted values can cause very strange results. I have tried it before.
Since level fitted values are not available, this also explains why level confidence intervals cannot be assigned. By default, scalecast confidence intervals have been determined from bootstrapping residuals based off the fitted values. No level fitted values means no level confidence intervals. The best research into this problem suggests using probabilistic forecasting to assign confidence intervals, which is now available in version 0.13.0 of scalecast, released right before you opened this issue (if you like coincidences). Probabilistic forecasting is possible through the use of the new proba_forecast() method, which is an alternative to manual_forecast() that requires more computational resources, and will be also available using auto_forecast(probabilistic=True) in 0.13.1 (in the meantime, you can use proba_forecast(**f.best_params) to mirror the auto_forecast() function with probabilistic modeling). Using it this way will assign confidence intervals to all models even if they were run on differenced data. It is also the better way to assign confidence intervals generally for time series machine learning. However, it is not a technique that works for deterministic models, ie models with a closed underlying function (MLR, Lasso, etc.). Those models will still be without confidence intervals for level data if they were run on differenced data, and I don't see a good way around that for now.
Now, the part about the export functions. I will work on that for 0.13.1 as well, but again, level confidence intervals for models run at differences will only be available if the model was called using a probabilistic modeling function (proba_forecast()) and the model itself is not deterministic. Trying to export for other kinds of models will not return good results. I know this doesn't fix all of the issues you raised, but the research into these exact kinds of problems is very sparse. A package like sktime, for example, doesn't bother with differencing data and uses a detrender function to deal with these situations, a solution I find to be overly complex for scalecast but something we can consider looking into.
Let me know what you think about any/all of this. Thanks.
If I understand correctly:
- The models build the CI based on the differenced/non-level values.
- Some models cannot undifference the values to get the CI level or plot_fitted.
With that,
- proba_forecast can be used to approximate a CI on non-deterministic models at level
- Cannot currently create a CI for deterministic models at level
I have two followup questions:
- How do we get lvl_test_set_pred and level forecasts? Somehow the models like KNN or xgBoost return values at the correct level. How does this differ from creating CIs?
- When running proba_forecast, how do we access the CI and make sure those values are returned as well.
Please let me know if this makes sense.
I've been looking into this more, and I think I've changed my mind. The next dist will have level fitted values, level in-sample metrics, and level confidence intervals for all models. I don't actually think the issue is as complicated as I was thinking. I will shoot to have that by Monday. Just the Forecaster object for now but with plans to apply to the MVForecaster object later.
To respond to question 1: The confidence intervals are generated through sampling residuals from the fitted values. I don't think simply undifferencing confidence intervals is a sound way to generate them. It's not true "bootstrapping" in that instance. That's why we need to have the fitted values first to be able to generate them. I still think probabilistic modeling is the more sound way to generate them, but unfortunately, it only works for non-deterministic models.
To question 2: The export functions currently don't access level confidence intervals. You can see them using:
f.history[model_name]['LevelUpperCI']
f.history[model_name]['LevelLowerCI']
f.history[model_name]['LevelTSUpperCI']
f.history[model_name]['LevelTSLowerCI']
You can also plot like this:
f.plot(level=True,ci=True)
f.plot_test_set(level=True,ci=True)
If the model has level confidence intervals (they are called through probabilistic forecasting), they will be displayed in the plot.
In 0.13.1, you will also be able to use:
f.export(['lvl_fcsts','lvl_test_set_preds'],cis=True)
We will also deprecate several export functions as the functionality will now be duplicated in Forecaster.export() and MVForecaster.export(). A deprecation warning will be applied to those functions and they will be removed in 0.14.0. There are two in Forecaster and 5 in MVForecaster.