statsforecast icon indicating copy to clipboard operation
statsforecast copied to clipboard

Statsforecast tries to grab 360gb memory; crashes notebook

Open scottee opened this issue 1 year ago • 15 comments

I'm using the latest statsforecast and trying to run a forecast with 2 models and a dataset of 90K rows. The lib tries to grab 360gb of memory and crashes my notebook. Happened several times in a row with me trying various options with fewer models and fewer rows. Why is it trying to grab so much memory, and how can I avoid that?

scottee avatar Jun 19 '23 15:06 scottee

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

scottee avatar Jun 19 '23 16:06 scottee

One other config point... I had n_jobs=-1. I reduced n_jobs=1 and so far it hasn't grabbed such crazy amounts of memory.

Could you provide MRE?

akmalsoliev avatar Jun 22 '23 21:06 akmalsoliev

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

scottee avatar Jun 23 '23 20:06 scottee

I can't give you the example that caused the problem because of data. I haven't been able to reproduce it on toy examples. It always happens on the example with 90K rows and n_jobs=-1. Know of a large public dataset we both can get access to?

Do you experience the same issue when generating a dummy dataset with from statsforecast.util import generate_series?

  • What OS are you running?
  • What version of python?
  • What version of statsforecast?

Please provide the code of the Statsforecast and set parameters, additionally the list of models and their parameters.

akmalsoliev avatar Jun 23 '23 20:06 akmalsoliev

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

    models=[
        sfm.AutoARIMA(season_length=12, alias='ARIMA'),
        sfm.AutoARIMA(season_length=12, allowdrift=True, alias='ARIMA2'),
        # Orig prob happened with all models, but still happened with just the two above.
        #sfm.AutoCES(season_length=12, alias='CES'),
        #sfm.SeasonalNaive(season_length=12, alias='SN'),
        #sfm.SeasonalWindowAverage(season_length=12, window_size=3, alias='SWA'),
        #sfm.RandomWalkWithDrift(alias='RWD'),
        #sfm.HistoricAverage(alias='HA'),
    ],

    fcster = StatsForecast(
        models=models,
        freq='M',
        n_jobs=-1,  # With 1 or 4, mem problem doesn't happen.
        fallback_model=sfm.HistoricAverage(alias='HA'),
    )

    fcst_df = fcster.forecast(
        df=train_df,
        h=17,
        fitted=True,
    )

scottee avatar Jun 23 '23 20:06 scottee

OS is MacOS Ventura 13.3.1 Python is 3.9.12 Statsforecast is the latest version, but I don't know the number as my jupyter env is set up differently right now.

As for generate_series(), I've not used that before, but I can take a look. (Background: I inherited a notebook that encountered this mem problem, so I don't know much about statsforecast.)

Ah knew that it was macOS, yeah sadly there is no solution to this, numba currently does not support multithreading on apple silicon. The only work around is spinning up a docker container and running your Jupyter Notebook from there.

akmalsoliev avatar Jun 23 '23 20:06 akmalsoliev

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

scottee avatar Jun 23 '23 20:06 scottee

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

akmalsoliev avatar Jun 23 '23 20:06 akmalsoliev

Why do I need docker, since it was running on Linux?

scottee avatar Jun 23 '23 20:06 scottee

Why do I need docker, since it was running on Linux?

is your Jupyter instance running on Amazon EC2 instance? can you provide a screenshot of what kernel you're using?

akmalsoliev avatar Jun 23 '23 20:06 akmalsoliev

Yes, jupyter is running on AWS. Not sure which kernel are you referring to. The OS kernel is as shown in my "Doh!!" comment. Jupyter kernel is a "Python 3 (ipykernel)". Otherwise, lmk more details of which kernel you're after.

scottee avatar Jun 23 '23 21:06 scottee

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

umitkaanusta avatar Jul 07 '23 12:07 umitkaanusta

Doh!! MacOS is where I'm running the browser to jupyter. Jupyter server is running on Linux:

Linux xxx 5.4.219-126.411.amzn2.x86_64 #1 SMP Wed Nov 2 17:44:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

All the other values were from jupyter server machine.

If you're unfamiliar with docker then try running your notebook on Amazon's Sagemaker or google colab https://colab.research.google.com/

Had the same problem in a Colab notebook even if I set num_cores=1 with the Rossmann competition dataset (1M rows, 1k time-series but it perfectly fits the RAM w/ other models)

What are the set parameters? You have to use n_job=-1

akmalsoliev avatar Jul 07 '23 12:07 akmalsoliev

Can you elaborate "set parameters"? Tried -1 as well which did not work

I initialized AutoARIMA as such: model = AutoARIMA(num_cores=-1, season_length=7)

umitkaanusta avatar Jul 07 '23 12:07 umitkaanusta

@umitkaanusta try wrapping the model in Statsforecast and then proceeding from there. n_jobs=-1 works on my end.

akmalsoliev avatar Jul 13 '23 09:07 akmalsoliev