sktime
sktime copied to clipboard
[BUG] `AutoETS` fails to fit some time series accurately
Describe the bug
Inspired by this issue, we ran some experiments using the AutoETS
model. We found that for some unstable series (with outliers, mainly), the forecast performance of this model degrades by many orders of magnitude compared with other implementations (R and StatsForecast
). I believe the problem is related to the statsmodels
implementation (https://github.com/statsmodels/statsmodels/issues/8344) (see, for example, the following table using the M4 dataset).
![image](https://user-images.githubusercontent.com/10517170/182445444-9d4d9490-924f-4bea-b1b5-02f2c513a316.png)
As additional preprocessing must be done for this type of unstable series, I think it would be very positive for the users if a warning about the use of the model was included in the documentation, particularly for fitting multiple time series.
Our results suggest that our version is more robust than current alternatives. If you are interested, I could work on including this version in sktime
.
Sure, interfacing would be appreciated!
I think currently we are probably going to move towards a stance of allowing multiple implementations of the same "modelling idea".
A decision has not been taken yet, if you want to see the discussion and/or weigh in, see here: https://github.com/alan-turing-institute/sktime/pull/3155
statsforecast
and your recent addition of ARIMA
is one of the more prominent instances related to this discussion, so would be interested to hear your opinion!
Under the proposed change, any number of ARIMA or AutoETS are ok to live in sktime
(per interface, or natively), and which one of them is presented as the "default" ARIMA would depend on popularity as well as scientific study results (e.g., the above, reproducible study).
I.e., under the proposed change, sktime
core devs would not gatekeep the presence of algorithms anymore as strongly as it is the case under the current model, which follows more the scikit-learn
model of a curated selection.
@FedericoGarza I think the data you used has an extreme outlier as first data point. Could you show the results without that outlier?
@fkiraly: Thanks for the answer. We will start working on the implementation. Congrats on this new direction. We are honored to be able to provide more options for the community. :D
@aiwalter: You are right. This benchmark dataset proves that the current StatsModels
' ETS implementation is not robust, challenging its purpose as a baseline model. R's and StatsForecast
's implementation can fit these series without problems. We believe the root cause is that the current implementation uses an unstable quadratic optimization instead of the Nelder-Mead
optimization. We argue that series with outliers are highly present in real-world scenarios and that their robustness is a critical characteristic of baseline models. It would be helpful for Python's forecasting community to know that previous ETS alternatives struggled with these characteristics.
It would be helpful for Python's forecasting community to know that previous ETS alternatives struggled with these characteristics.
I hope you are writing a paper where this is being explained? This sounds like a high impact finding.
Awesome decision @fkiraly 👏
FYI: A PR has been created in statsmodels
that enables multiple optimizers (including Nelder-Mead) for ETS https://github.com/statsmodels/statsmodels/pull/8486