llama ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9)

I'm trying to run the 7B model on an rtx 3090 (24gb) on WSL Ubuntu but I'm getting the following error:

jawgboi@DESKTOP-SLIQCDH:~/git/llama$ torchrun --nproc_per_node 1 example.py --ckpt_dir "./model/7B" --tokenizer_path "./model/tokenizer.model"
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 25586) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
example.py FAILED
------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-03_23:20:30
  host      : DESKTOP-SLIQCDH.
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 25586)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 25586
======================================================

I have tried:

Changing torch.distributed.init_process_group("nccl") to torch.distributed.init_process_group("gloo")
Adding .cuda().half() to the end of model = Transformer(model_args)
Changing the 32 in max_batch_size: int = 32, to 8

Mar 03 '23 23:03 hopto-dot

Did you enable CUDA in WSL? https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl

Mar 04 '23 12:03 neuhaus

Have you tried to set CUDA_VISIBLE_DEVICES=0?

Mar 04 '23 17:03 dmitry

What memory limit you have in .wslconfig ? I believe you need to make this value big enough so torch can load things before moving them to GPU, also try to specify the device you will be using during inference. Example : torchrun --nproc_per_node 1 example.py --ckpt_dir ./7B --tokenizer_path ./tokenizer.model --device 0

Mar 05 '23 15:03 mtb0x1

What memory limit you have in .wslconfig ? I believe you need to make this value big enough so torch can load things before moving them to GPU, also try to specify the device you will be using during inference. Example : torchrun --nproc_per_node 1 example.py --ckpt_dir ./7B --tokenizer_path ./tokenizer.model --device 0

Yeah same issue increase the ram as it was not enough to first load it in RAM and then load to GPU

Mar 06 '23 10:03 gamingflexer

Same error, I modified the max_batch_szie to 1, still get this error

Mar 07 '23 06:03 kli017

Found solution yet?

Mar 10 '23 18:03 capripio

use this notebook i have tried on it

https://colab.research.google.com/drive/1ESttkeO8Ww2--8dlNLIGuEoG4qhP_r96?usp=sharing

Mar 11 '23 02:03 gamingflexer

Same issue here! any solution?

update: there is no issue for me anymore by using 2 a6000 and 100GB memory for 7B and 13B models with MP=2.

Mar 14 '23 18:03 aliaraabi

Same issue also here , thx for the help

Mar 24 '23 12:03 crypto-maniac

same issue, anyone has solution?

Mar 31 '23 14:03 raytions

I am running into similar issue using A100 GPU

Mar 31 '23 19:03 SujoyDutta

I'm also having this issue.

Apr 04 '23 04:04 nanghtet

I had the same issue when using T4 * 4 on docker container. I cloud solve the problem by increasing shared memory size 64mb (default) -> 8gb Example: docker run --shm-size=8gb Or if you use docker-compose.yaml:

services:
  your_service:
    shm_size: '8gb'

Apr 21 '23 08:04 Tyaba

Have you tried modifying .wslconfig file for more memory and more processors? It works for me.

Apr 26 '23 06:04 BoxiangW

What memory limit you have in .wslconfig ? I believe you need to make this value big enough so torch can load things before moving them to GPU, also try to specify the device you will be using during inference. Example : torchrun --nproc_per_node 1 example.py --ckpt_dir ./7B --tokenizer_path ./tokenizer.model --device 0

Yeah same issue increase the ram as it was not enough to first load it in RAM and then load to GPU

Can I ask what is the smallest RAM required? I've tried 12GB but still no luck

May 01 '23 03:05 jzhu382

colab runtime choose GPU =>runtime shape choose high RAM will solve this problem

Jul 23 '23 15:07 frankchieng

I had the same issue, and I solved it by increasing .wslconfig to memory=16GB and processors=8 (I think it can be reduced). This message appears : Loaded in 87.83 seconds

Aug 11 '23 06:08 pierrebelin

colab 运行时选择 GPU =>运行时形状选择高 RAM 将解决此问题

Can you speak more clearly

Sep 27 '23 07:09 threeneedone

I'm trying to run the 7B model on an rtx 3090 (24gb) on WSL Ubuntu but I'm getting the following error:

jawgboi@DESKTOP-SLIQCDH:~/git/llama$ torchrun --nproc_per_node 1 example.py --ckpt_dir "./model/7B" --tokenizer_path "./model/tokenizer.model"
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 25586) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
example.py FAILED
------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-03_23:20:30
  host      : DESKTOP-SLIQCDH.
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 25586)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 25586
======================================================

I have tried:

1. Changing `torch.distributed.init_process_group("nccl")` to `torch.distributed.init_process_group("gloo")`

2. Adding `.cuda().half()` to the end of `model = Transformer(model_args)`

3. Changing the 32 in `max_batch_size: int = 32,` to `8`

Did you check htop (or the like) if you do not simply run out of memory?

Oct 08 '23 06:10 feinmann

same error here. anyone fix it?

Mar 19 '24 01:03 xiaxin1998

llama llama copied to clipboard

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9)

llama
llama copied to clipboard