statsforecast
statsforecast copied to clipboard
Slow first training with cross-validation
Describe the bug Related to #84
We implemented the statsforecast
integration in pycaret using the sktime adapter. On implementing cross-validation, we noticed that the first model training is slow (for all folds in the cross-validation) - see model2 here. Does the numba compilation happen in each fold during the first model build (maybe because all folds are run in parallel)? This will be helpful since training a single time series model with cross-validation will be a common use case.
Subsequent models seem to train fast as expected (model 3 in the above code example)
To Reproduce See https://nbviewer.org/gist/ngupta23/4e2e90183c7f08555df3cfebe3df9756
Expected behavior Could the Cross-validation on the first trained model be made faster?
Desktop (please complete the following information): System: python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0] executable: /usr/bin/python3 machine: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Additional context
In the above code, if I change exp
to run using 1 core only, then statsforecast
runs much faster (25 seconds for 5 folds compared to 46 seconds earlier). So, it seems like the first training does numba compilation for each fold (whe cv is run in parallel) and hence takes longer.
#### default auto_arima engine is pmdarima for now ----
exp.setup(data=data, fh=12, session_id=42, fold=5, n_jobs=1)
@all-contributors please add @ngupta23 for bug
@mergenthaler @FedericoGarza Do you know if there is a solution for this problem. If we can find a solution, we can advertise the statsforecast
integration with pycaret
(since it is already implemented).
FYI... I tried building a dummy model before doing the cross validation with multiple cores thinking that the multiple cores could then utilize the complied numba code by the dummy model, but it does not seem to help.
Any inputs here would be appreciated. Thanks!