aeon
aeon copied to clipboard
[MNT] Reduce the forecasting test suite execution time
Currently testing the forecasting subpackage takes up ~60% of the total testing time, which varies between 1-3 h :scream_cat: and therefore need to be cut down to a reasonable number to improve dev experience and productivity.
This issue aim to collect ideas and suggestions on where could we look for opportunities to cut down the testing time, so please share your thoughts.
Related:
- #151
I want to look into the issue, too.
For the long running tests, I have checked, 50% of the time is spent in validation of inputs rather than the downstream task. The same checks are repeated first for the full input X and y, then for conversion of X and y, then for each fold of X and y, then for the conversion of each fold, …
Test named test_predict_time_index
is run 7236 times, which is 134 estimators x 9 types VALID_INDEX_FH_COMBINATIONS
x 6 forecasting horizons TEST_FHS
. This test cumulatively takes the most time to run from all forecasting tests.
https://github.com/aeon-toolkit/aeon/blob/5ede43c1f28c6ec7a7af3ba4d8350dd7cdbdba5b/sktime/forecasting/tests/test_all_forecasters.py#L202-L206
There is however a separate test for forecasting horizons test_fh
that is run 54 time for the same 9 types x 6 forecasting horizons.
https://github.com/aeon-toolkit/aeon/blob/5ede43c1f28c6ec7a7af3ba4d8350dd7cdbdba5b/sktime/forecasting/base/tests/test_fh.py#L54-L58
It feels like those two are testing the same thing since ForecastngHorizon
is a part of the BaseForecaster
class
https://github.com/aeon-toolkit/aeon/blob/5ede43c1f28c6ec7a7af3ba4d8350dd7cdbdba5b/sktime/utils/validation/forecasting.py#L265
and arguments passed as fh
get coerced to ForecastingHorizon
for all estimators using BaseForecaster
by calling check_fh
.
It seems the original test_predict_time_index
only tests forecasters inheriting from BaseForecaster
and essentially testing the base class instead of actual implementation. Therefore I would drop the test entirely, unless I'm missing something.
I think we have accomplished this somewhat with the PR_TESTING
setup? I'm sure the individual tests could be improved a lot though.
closing this, testing now in hand