neuralforecast icon indicating copy to clipboard operation
neuralforecast copied to clipboard

Mac MINI M4 2024 Forecast on GPU Memory Issue. On CPU is fine

Open Mogiaro opened this issue 11 months ago • 7 comments

What happened + What you expected to happen

Hello i'm using latest version 3.0.0 on Python 3.9 or 3.10 on a brand new .venv

What i get is that when i'm training a model for a "small" dataframe reading about 6MB of csv (attached), on CPU (accelerator="cpu") it goes fine and even with 27 days concat it goes fine. When i remove the accelerator line the error i got prevent me to fit the model since it's triggering the GPU enabled. This is so strange in my opinion. I can attach the .csv data and the notebook i'm using.

message.txt

Review of the Error: RuntimeError: Invalid buffer size: 17.62 GB

NOTE: 17.62 is just a case, if i use a different structure or change something it could be also 42 GB and it feels really strange since Neural Networks should work on so little data in my few experience...

Can you help in some way? Thanks!

Versions / Dependencies

Python 3.9/10 on MacMini latest OS. Libs are the default doing

pip install neuralforecast==3.0.0

Reproduction script

bug-report-memory.ipynb.zip

bug_report_data.csv

Issue Severity

High: It blocks me from completing my task.

Mogiaro avatar Mar 21 '25 15:03 Mogiaro

Hello! Can you try reducing the batch size? That usually solves out-of-memory errors. For example, you could do:

nhits = NHITS(
    h=horizon, 
    input_size=input_size, 
    batch_size=32,
    windows_batch_size=32
)

Let me know if this works!

marcopeix avatar Mar 24 '25 13:03 marcopeix

No, changing batch size is not make it work. I tried 32, 16, 8

Mogiaro avatar Mar 24 '25 14:03 Mogiaro

Options (outside of the possible M4 GPU issue):

  1. Try using n_block=2 in TSMixer, your TSMixer model is huge (17.7M parameters).

  2. Set windows_batch_size=32, inference_windows_batch_size=32.

  3. Remove the static_df.

  4. Set n_series to the amount of series you want to jointly predict. Preprocess the data accordingly (each column should be a unique_id) TSMixer is now a bit of a weird choice as you're doing univariate forecasting (n_series=1) with a multivariate model. I assume you are interested in doing multivariate forecasting. If not, TSMixer is not a good choice of model. Switch to e.g. NHITS.

elephaint avatar Mar 24 '25 14:03 elephaint

With n_blocks=2 it does change the num of params but the error of buffer size still remains. I could try another model, but i believe the buffer size problem for GPU will still remain

Mogiaro avatar Mar 24 '25 14:03 Mogiaro

With n_blocks=2 it does change the num of params but the error of buffer size still remains. I could try another model, but i believe the buffer size problem for GPU will still remain

What difference did the other suggestions make?

I was also thinking, try to include only data that the model also consumes. So, if the model doesn't use the exogenous variables in your data, drop them from the input dataframe. I think we currently lack a prefilter for unused data.

elephaint avatar Mar 24 '25 19:03 elephaint

I tried passing only 5 columns in total (id, ds, y + 2 exogVar). That went well for 1 day. But if I take more days to have more train data, it is not working, having the same error of buffer size.

On Mon, Mar 24, 2025 at 8:50 PM Olivier Sprangers @.***> wrote:

With n_blocks=2 it does change the num of params but the error of buffer size still remains. I could try another model, but i believe the buffer size problem for GPU will still remain

What difference did the other suggestions make?

I was also thinking, try to include only data that the model also consumes. So, if the model doesn't use the exogenous variables in your data, drop them from the input dataframe. I think we currently lack a prefilter for unused data.

— Reply to this email directly, view it on GitHub https://github.com/Nixtla/neuralforecast/issues/1297#issuecomment-2749234908, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATLXJTK5MHLEKNJ2AUMAQRD2WBOYHAVCNFSM6AAAAABZQK5VKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBZGIZTIOJQHA . You are receiving this because you authored the thread.Message ID: @.***> [image: elephaint]elephaint left a comment (Nixtla/neuralforecast#1297) https://github.com/Nixtla/neuralforecast/issues/1297#issuecomment-2749234908

With n_blocks=2 it does change the num of params but the error of buffer size still remains. I could try another model, but i believe the buffer size problem for GPU will still remain

What difference did the other suggestions make?

I was also thinking, try to include only data that the model also consumes. So, if the model doesn't use the exogenous variables in your data, drop them from the input dataframe. I think we currently lack a prefilter for unused data.

— Reply to this email directly, view it on GitHub https://github.com/Nixtla/neuralforecast/issues/1297#issuecomment-2749234908, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATLXJTK5MHLEKNJ2AUMAQRD2WBOYHAVCNFSM6AAAAABZQK5VKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBZGIZTIOJQHA . You are receiving this because you authored the thread.Message ID: @.***>

Mogiaro avatar Mar 25 '25 08:03 Mogiaro

In debug mode I investigated a similar issue and found that "windows_batch_size" / "inference_windows_batch_size" had no effect at all: it just used the full data set to create all the training windows. This was using a NVIDIA GPU.

I suspect this just isn't implemented for some reason. Probably also the cause of #1357

samcraig678 avatar Sep 05 '25 06:09 samcraig678