neuralforecast icon indicating copy to clipboard operation
neuralforecast copied to clipboard

Code stuck on "initalizing ddp" when using more than one gpu on neuralforecast AutoTFT, AutoNHITs

Open philip-ndikum opened this issue 2 years ago • 2 comments

What happened + What you expected to happen

When running this notebook with multi-GPU

https://colab.research.google.com/github/Nixtla/neuralforecast/blob/main/nbs/examples/IntermittentData.ipynb

Code stuck on "initalizing ddp" - are there any parameters to control the number of GPU's utilized? Example if I have 10 GPUs but I only want to use one GPU for AutoTFT how could I do this without manually changing the low-level code? Could I define this globally in code or notebook?

Related to this issue here: https://github.com/Lightning-AI/lightning/issues/4612

Error occurs here: https://github.com/Nixtla/neuralforecast/blob/main/neuralforecast/losses/pytorch.py

Versions / Dependencies

neuralforecast==1.5.0

Reproduction script

nf.fit(df=Y_df)

Issue Severity

High: It blocks me from completing my task.

philip-ndikum avatar Jul 07 '23 01:07 philip-ndikum

I have the same problem when training just with neuralforecast , without using ray. There are 2 video cards and I prescribe in the network parameters: strategy='ddp_notebook', # 'dp'|'ddp_notebook'/'ddp_spawn' accelerator = 'gpu', devices = [0,1], , and constantly error You selected Trainer(strategy='') but process forking is not supported on this platform.

ValueError: You selected Trainer(strategy='ddp_notebook') but process forking is not supported on this platform. We recommed Trainer(strategy='ddp_spawn') instead.

ValueError: You selected Trainer(strategy='ddp_fork') but process forking is not supported on this platform. We recommed Trainer(strategy='ddp_spawn') instead.

MisconfigurationException: Trainer(strategy='ddp_spawn') is not compatible with an interactive environment. Run your code as a script, or choose one of the compatible strategies: Fabric(strategy='dp'|'ddp_notebook'). In case you are spawning processes yourself, make sure to include the Trainer creation inside the worker function.

TDL77 avatar Jul 07 '23 04:07 TDL77

Your error is more complicated

I have the same problem when training just with neuralforecast , without using ray. There are 2 video cards and I prescribe in the network parameters: strategy='ddp_notebook', # 'dp'|'ddp_notebook'/'ddp_spawn' accelerator = 'gpu', devices = [0,1], , and constantly error You selected Trainer(strategy='') but process forking is not supported on this platform.

ValueError: You selected Trainer(strategy='ddp_notebook') but process forking is not supported on this platform. We recommed Trainer(strategy='ddp_spawn') instead.

ValueError: You selected Trainer(strategy='ddp_fork') but process forking is not supported on this platform. We recommed Trainer(strategy='ddp_spawn') instead.

MisconfigurationException: Trainer(strategy='ddp_spawn') is not compatible with an interactive environment. Run your code as a script, or choose one of the compatible strategies: Fabric(strategy='dp'|'ddp_notebook'). In case you are spawning processes yourself, make sure to include the Trainer creation inside the worker function.

Your error is more complicated it may be to do with your version Pytorch Lightning - this is not directly related to my issue. Either way it would be good to be able to directly change GPU options within the function parameters without changing low level code options.

philip-ndikum avatar Jul 07 '23 12:07 philip-ndikum