llama icon indicating copy to clipboard operation
llama copied to clipboard

RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found?

Open Jesparzarom opened this issue 1 year ago • 19 comments

What is the reason behind and how to fix the error:

RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

?

I'm trying to run example_text_completion.py with:

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

And example_chat_completion.py using:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

But I'm getting this RuntimeError, Help!

Jesparzarom avatar Jul 20 '23 06:07 Jesparzarom

Well if it helps, chatGPT says : "If you are using a development environment like WSL2 on Windows or a virtual machine without direct GPU access, you may not be able to use the NCCL process group due to virtualized hardware limitations. In that case, you may want to consider using a system with a dedicated GPU or review your virtual machine's configuration to enable GPU access if possible.".

Perhaps a solution can be found at: https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl ? or https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute ?

I guess I'll try to see how to do that.

Jesparzarom avatar Jul 20 '23 07:07 Jesparzarom

@Jesparzarom What's the output of this?

import torch

if __name__ == "__main__":
    print("Cuda support:", torch.cuda.is_available(),":", torch.cuda.device_count(), "devices")

webeng avatar Jul 21 '23 20:07 webeng

@webeng Hello, the result is => Cuda support: False : 0 devices.

Anyway I realized that I was overconfident and distracted trying to use Nvidia, when I have AMD! :(.

For now I have stopped, I was really just curious and wanted to experiment like with the OpenAI api (but for free xD).

Jesparzarom avatar Jul 22 '23 18:07 Jesparzarom

I have T4 installed and still facing the same issue on an AWS compute machine

pytorch version - 2.0.1+cu117

Cuda support: False : 0 devices
Traceback (most recent call last):
  File "/home/ubuntu/llama/example_text_completion.py", line 57, in <module>
    fire.Fire(main)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ubuntu/llama/example_text_completion.py", line 19, in main
    generator = Llama.build(
  File "/home/ubuntu/llama/llama/generation.py", line 62, in build
    torch.distributed.init_process_group("nccl")
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
    default_pg = _new_process_group_helper(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1024, in _new_process_group_helper
    backend_class = ProcessGroupNCCL(backend_prefix_store, group_rank, group_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1115) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 798, in <module>
    main()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-24_11:28:14
  host      : ip-10-0-2-211
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1115)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

criminact avatar Jul 24 '23 11:07 criminact

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

pzim-devdata avatar Jul 25 '23 14:07 pzim-devdata

@pzim-devdata

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

great!, I'm going to try that to see if it works for me, thanks for the info

Jesparzarom avatar Jul 26 '23 22:07 Jesparzarom

If you are confirm that you're using GPU(s) then try updating the nvidia drivers to an appropriate version(in Ubuntu distro anything >= 450 is good enough)

Try running torch.cuda.devices to get the number of devices, It should show correct number of devices. Once it starts showing the correct GPU(s) connected, you're good to go to run the model.

criminact avatar Aug 08 '23 09:08 criminact

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

Why downloading two repos instead of one and then copying the content from one to another? Just clone the krychu repo and follow instructions from the official README,md

tatarinla avatar Aug 20 '23 20:08 tatarinla

@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :

  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3
Traceback (most recent call last):
  File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?

xanthexu avatar Aug 21 '23 06:08 xanthexu

@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :

  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3
Traceback (most recent call last):
  File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?

I also encounter same problems.

sammyview80 avatar Aug 27 '23 07:08 sammyview80

@pzim-devdata Thanks for the directions. Any ideas BTW how to make it use all CPU cores?

yapus avatar Aug 27 '23 21:08 yapus

@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :

  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3
Traceback (most recent call last):
  File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?

I have the same issue.

guertsen avatar Sep 14 '23 06:09 guertsen

I am having the same issue on a Pop-OS Huawei Matebook with AMD Radeon Vega graphics. Is it a bad graphics card for this type of stuff?

prototorpedo avatar Sep 17 '23 23:09 prototorpedo

@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :

  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3
Traceback (most recent call last):
  File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?

I have the same issue.

Hello, I have the same issue too btw.

oceanedruenne avatar Sep 22 '23 15:09 oceanedruenne

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

thanks for the reference repo, I was able to use the text_completion program as intended, but it is giving me error in case of chat_completion. Is there a different way to achieve the same? I have tried lowering the max_seq_len, but that didn't seem to work.

image

image

thanks!

pratikkejriwal avatar Sep 28 '23 09:09 pratikkejriwal

Same here with a virtualized GPU VM

Clivern avatar Oct 17 '23 13:10 Clivern

torchrun --nproc_per_node 1 example_chat_completion.py \

--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 6

Traceback (most recent call last): File "/home/bsp/OpenAiBase/MetaAI/llama/example_chat_completion.py", line 104, in fire.Fire(main) File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/bsp/OpenAiBase/MetaAI/llama/example_chat_completion.py", line 35, in main generator = Llama.build( File "/home/bsp/OpenAiBase/MetaAI/llama/llama/generation.py", line 85, in build torch.distributed.init_process_group("nccl") File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 86, in wrapper func_return = func(*args, **kwargs) File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1184, in init_process_group default_pg, _ = _new_process_group_helper( File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1339, in _new_process_group_helper backend_class = ProcessGroupNCCL( ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found! [2024-04-16 10:31:30,224] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 4118487) of binary: /usr/bin/python3 Traceback (most recent call last): File "/home/bsp/.local/bin/torchrun", line 8, in sys.exit(main()) File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper return f(*args, **kwargs) File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_chat_completion.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2024-04-16_10:31:30 host : bsp-Precision-3650-Tower rank : 0 (local_rank: 0) exitcode : 1 (pid: 4118487) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Hi, all guys ,I have the same issues from overs logs ,how to fix it?

yangxp0312 avatar Apr 16 '24 02:04 yangxp0312

Sorry for vague response, I don't really understand these things by I have encountered this problem, when the PyTorch build does not match the type of GPU you use. The PyTorch build has to match the corresponding cuda update that NVIDIA releases. I can't say which one needs which sorry

egor-2 avatar May 02 '24 18:05 egor-2

CUDA Version: 11.4 
torch==2.0.0

works

CUDA Version: 11.4 
torch==2.1.0 / torch==2.2.0 / torch==2.3.0

not works

guotong1988 avatar Jun 04 '24 01:06 guotong1988