statsforecast icon indicating copy to clipboard operation
statsforecast copied to clipboard

Slow first training with cross-validation

Open ngupta23 opened this issue 2 years ago • 3 comments

Describe the bug Related to #84

We implemented the statsforecast integration in pycaret using the sktime adapter. On implementing cross-validation, we noticed that the first model training is slow (for all folds in the cross-validation) - see model2 here. Does the numba compilation happen in each fold during the first model build (maybe because all folds are run in parallel)? This will be helpful since training a single time series model with cross-validation will be a common use case.

image

Subsequent models seem to train fast as expected (model 3 in the above code example)

To Reproduce See https://nbviewer.org/gist/ngupta23/4e2e90183c7f08555df3cfebe3df9756

Expected behavior Could the Cross-validation on the first trained model be made faster?

Desktop (please complete the following information): System: python: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0] executable: /usr/bin/python3 machine: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic

Additional context In the above code, if I change exp to run using 1 core only, then statsforecast runs much faster (25 seconds for 5 folds compared to 46 seconds earlier). So, it seems like the first training does numba compilation for each fold (whe cv is run in parallel) and hence takes longer.

#### default auto_arima engine is pmdarima for now ----
exp.setup(data=data, fh=12, session_id=42, fold=5, n_jobs=1)

image

ngupta23 avatar Aug 02 '22 22:08 ngupta23

@all-contributors please add @ngupta23 for bug

mergenthaler avatar Aug 03 '22 01:08 mergenthaler

@mergenthaler

I've put up a pull request to add @ngupta23! :tada:

allcontributors[bot] avatar Aug 03 '22 01:08 allcontributors[bot]

@mergenthaler @FedericoGarza Do you know if there is a solution for this problem. If we can find a solution, we can advertise the statsforecast integration with pycaret (since it is already implemented).

FYI... I tried building a dummy model before doing the cross validation with multiple cores thinking that the multiple cores could then utilize the complied numba code by the dummy model, but it does not seem to help.

Any inputs here would be appreciated. Thanks!

ngupta23 avatar Nov 26 '22 19:11 ngupta23