llama
llama copied to clipboard
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found?
What is the reason behind and how to fix the error:
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
?
I'm trying to run example_text_completion.py
with:
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 4
And example_chat_completion.py
using:
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 4
But I'm getting this RuntimeError, Help!
Well if it helps, chatGPT says : "If you are using a development environment like WSL2 on Windows or a virtual machine without direct GPU access, you may not be able to use the NCCL process group due to virtualized hardware limitations. In that case, you may want to consider using a system with a dedicated GPU or review your virtual machine's configuration to enable GPU access if possible.".
Perhaps a solution can be found at: https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl ? or https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute ?
I guess I'll try to see how to do that.
@Jesparzarom What's the output of this?
import torch
if __name__ == "__main__":
print("Cuda support:", torch.cuda.is_available(),":", torch.cuda.device_count(), "devices")
@webeng Hello, the result is => Cuda support: False : 0 devices
.
Anyway I realized that I was overconfident and distracted trying to use Nvidia, when I have AMD! :(.
For now I have stopped, I was really just curious and wanted to experiment like with the OpenAI api (but for free xD).
I have T4 installed and still facing the same issue on an AWS compute machine
pytorch version - 2.0.1+cu117
Cuda support: False : 0 devices
Traceback (most recent call last):
File "/home/ubuntu/llama/example_text_completion.py", line 57, in <module>
fire.Fire(main)
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ubuntu/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ubuntu/llama/example_text_completion.py", line 19, in main
generator = Llama.build(
File "/home/ubuntu/llama/llama/generation.py", line 62, in build
torch.distributed.init_process_group("nccl")
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1024, in _new_process_group_helper
backend_class = ProcessGroupNCCL(backend_prefix_store, group_rank, group_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1115) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 798, in <module>
main()
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-07-24_11:28:14
host : ip-10-0-2-211
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1115)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I have solved it with a cpu installation by installing this : https://github.com/krychu/llama
instead of https://github.com/facebookresearch/llama
Complete process to install :
- download the original version of Llama from :
https://github.com/facebookresearch/llama
and extract it to allama-main
folder - download th cpu version from :
https://github.com/krychu/llama
and extract it and replace files in thellama-main
folder - run the
download.sh
script in a terminal, passing the URL provided when prompted to start the download - go to the
llama-main
folder - cretate an Python3 env :
python3 -m venv env
and activate it :source env/bin/activate
- install the cpu version of pytorch :
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
- install dependencies off llama :
python3 -m pip install -e .
- run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 1 #(instead of 4)
@pzim-devdata
I have solved it with a cpu installation by installing this :
https://github.com/krychu/llama
instead ofhttps://github.com/facebookresearch/llama
Complete process to install :
- download the original version of Llama from :
https://github.com/facebookresearch/llama
and extract it to allama-main
folder- download th cpu version from :
https://github.com/krychu/llama
and extract it and replace files in thellama-main
folder- run the
download.sh
script in a terminal, passing the URL provided when prompted to start the download- go to the
llama-main
folder- cretate an Python3 env :
python3 -m venv env
and activate it :source env/bin/activate
- install the cpu version of pytorch :
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
- install dependencies off llama :
python3 -m pip install -e .
- run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1 #(instead of 4)
great!, I'm going to try that to see if it works for me, thanks for the info
If you are confirm that you're using GPU(s) then try updating the nvidia drivers to an appropriate version(in Ubuntu distro anything >= 450 is good enough)
Try running torch.cuda.devices to get the number of devices, It should show correct number of devices. Once it starts showing the correct GPU(s) connected, you're good to go to run the model.
I have solved it with a cpu installation by installing this :
https://github.com/krychu/llama
instead ofhttps://github.com/facebookresearch/llama
Complete process to install :
- download the original version of Llama from :
https://github.com/facebookresearch/llama
and extract it to allama-main
folder- download th cpu version from :
https://github.com/krychu/llama
and extract it and replace files in thellama-main
folder- run the
download.sh
script in a terminal, passing the URL provided when prompted to start the download- go to the
llama-main
folder- cretate an Python3 env :
python3 -m venv env
and activate it :source env/bin/activate
- install the cpu version of pytorch :
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
- install dependencies off llama :
python3 -m pip install -e .
- run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1 #(instead of 4)
Why downloading two repos instead of one and then copying the content from one to another? Just clone the krychu repo and follow instructions from the official README,md
@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3
Traceback (most recent call last):
File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?
@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3 Traceback (most recent call last): File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module> sys.exit(main()) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES>
Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?
I also encounter same problems.
@pzim-devdata Thanks for the directions. Any ideas BTW how to make it use all CPU cores?
@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3 Traceback (most recent call last): File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module> sys.exit(main()) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES>
Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?
I have the same issue.
I am having the same issue on a Pop-OS Huawei Matebook with AMD Radeon Vega graphics. Is it a bad graphics card for this type of stuff?
@tatarinla I followed your protocol above and replace with llama-main . It still reports similar error :
File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1013, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 718178) of binary: /data/zxu/anaconda3/envs/DL/bin/python3 Traceback (most recent call last): File "/data/zxu/anaconda3/envs/DL/bin/torchrun", line 8, in <module> sys.exit(main()) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/data/zxu/anaconda3/envs/DL/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ example_text_completion.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES>
Do you know what's the problem here ? Does it mean I still need to install nccl even under CPU environment ?
I have the same issue.
Hello, I have the same issue too btw.
I have solved it with a cpu installation by installing this :
https://github.com/krychu/llama
instead ofhttps://github.com/facebookresearch/llama
Complete process to install :
- download the original version of Llama from :
https://github.com/facebookresearch/llama
and extract it to allama-main
folder- download th cpu version from :
https://github.com/krychu/llama
and extract it and replace files in thellama-main
folder- run the
download.sh
script in a terminal, passing the URL provided when prompted to start the download- go to the
llama-main
folder- cretate an Python3 env :
python3 -m venv env
and activate it :source env/bin/activate
- install the cpu version of pytorch :
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
- install dependencies off llama :
python3 -m pip install -e .
- run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1 #(instead of 4)
thanks for the reference repo, I was able to use the text_completion program as intended, but it is giving me error in case of chat_completion. Is there a different way to achieve the same? I have tried lowering the max_seq_len
, but that didn't seem to work.
thanks!
Same here with a virtualized GPU VM
torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 6
Traceback (most recent call last):
File "/home/bsp/OpenAiBase/MetaAI/llama/example_chat_completion.py", line 104, in
fire.Fire(main)
File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/bsp/.local/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/bsp/OpenAiBase/MetaAI/llama/example_chat_completion.py", line 35, in main
generator = Llama.build(
File "/home/bsp/OpenAiBase/MetaAI/llama/llama/generation.py", line 85, in build
torch.distributed.init_process_group("nccl")
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 86, in wrapper
func_return = func(*args, **kwargs)
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1184, in init_process_group
default_pg, _ = _new_process_group_helper(
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1339, in _new_process_group_helper
backend_class = ProcessGroupNCCL(
ValueError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
[2024-04-16 10:31:30,224] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 4118487) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/bsp/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/bsp/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
example_chat_completion.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2024-04-16_10:31:30 host : bsp-Precision-3650-Tower rank : 0 (local_rank: 0) exitcode : 1 (pid: 4118487) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Hi, all guys ,I have the same issues from overs logs ,how to fix it?
Sorry for vague response, I don't really understand these things by I have encountered this problem, when the PyTorch build does not match the type of GPU you use. The PyTorch build has to match the corresponding cuda update that NVIDIA releases. I can't say which one needs which sorry
CUDA Version: 11.4
torch==2.0.0
works
CUDA Version: 11.4
torch==2.1.0 / torch==2.2.0 / torch==2.3.0
not works