statsforecast Statsforecast tries to grab 360gb memory; crashes notebook

I'm using the latest statsforecast and trying to run a forecast with 2 models and a dataset of 90K rows. The lib tries to grab 360gb of memory and crashes my notebook. Happened several times in a row with me trying various options with fewer models and fewer rows. Why is it trying to grab so much memory, and how can I avoid that?

Jun 19 '23 15:06 scottee

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

Jun 19 '23 16:06 scottee

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

Could you provide MRE?

Jun 22 '23 21:06 akmalsoliev

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

Jun 23 '23 20:06 scottee

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

Do you experience the same issue when generating a dummy dataset with from statsforecast.util import generate_series?

What OS are you running?
What version of python?
What version of statsforecast?

Please provide the code of the Statsforecast and set parameters, additionally the list of models and their parameters.

Jun 23 '23 20:06 akmalsoliev

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

    models=[
        sfm.AutoARIMA(season_length=12, alias='ARIMA'),
        sfm.AutoARIMA(season_length=12, allowdrift=True, alias='ARIMA2'),
        # Orig prob happened with all models, but still happened with just the two above.
        #sfm.AutoCES(season_length=12, alias='CES'),
        #sfm.SeasonalNaive(season_length=12, alias='SN'),
        #sfm.SeasonalWindowAverage(season_length=12, window_size=3, alias='SWA'),
        #sfm.RandomWalkWithDrift(alias='RWD'),
        #sfm.HistoricAverage(alias='HA'),
    ],

    fcster = StatsForecast(
        models=models,
        freq='M',
        n_jobs=-1,  # With 1 or 4, mem problem doesn't happen.
        fallback_model=sfm.HistoricAverage(alias='HA'),
    )

    fcst_df = fcster.forecast(
        df=train_df,
        h=17,
        fitted=True,
    )

Jun 23 '23 20:06 scottee

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

Ah knew that it was macOS, yeah sadly there is no solution to this, numba currently does not support multithreading on apple silicon. The only work around is spinning up a docker container and running your Jupyter Notebook from there.

Jun 23 '23 20:06 akmalsoliev

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

Jun 23 '23 20:06 scottee

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Jun 23 '23 20:06 akmalsoliev

Why do I need docker, since it was running on Linux?

Jun 23 '23 20:06 scottee

Why do I need docker, since it was running on Linux?

is your Jupyter instance running on Amazon EC2 instance? can you provide a screenshot of what kernel you're using?

Jun 23 '23 20:06 akmalsoliev

Yes, jupyter is running on AWS. Not sure which kernel are you referring to. The OS kernel is as shown in my "Doh!!" comment. Jupyter kernel is a "Python 3 (ipykernel)". Otherwise, lmk more details of which kernel you're after.

Jun 23 '23 21:06 scottee

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.
If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

Jul 07 '23 12:07 umitkaanusta

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:
Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

What are the set parameters? You have to use n_job=-1

Jul 07 '23 12:07 akmalsoliev

Can you elaborate "set parameters"? Tried -1 as well which did not work

I initialized AutoARIMA as such: model = AutoARIMA(num_cores=-1, season_length=7)

Jul 07 '23 12:07 umitkaanusta

@umitkaanusta try wrapping the model in Statsforecast and then proceeding from there. n_jobs=-1 works on my end.

Jul 13 '23 09:07 akmalsoliev

statsforecast statsforecast copied to clipboard

Statsforecast tries to grab 360gb memory; crashes notebook

statsforecast
statsforecast copied to clipboard