llama
llama copied to clipboard
Error running example on 2 Nvidia A100 GPUs
Trying to run the 65B model on a vast.ai machine - though facing error - can anyone help me, by telling what could be goind wrong.
Error log -
Traceback (most recent call last):
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 242, in _lazy_init
queued_call()
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 125, in _check_capability
capability = get_device_capability(d)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 357, in get_device_capability
prop = get_device_properties(device)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 375, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/llama-dl/llama/example.py", line 119, in <module>
fire.Fire(main)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/llama-dl/llama/example.py", line 74, in main
local_rank, world_size = setup_model_parallel()
File "/root/llama-dl/llama/example.py", line 25, in setup_model_parallel
torch.cuda.set_device(local_rank)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 326, in set_device
torch._C._cuda_setDevice(device)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 246, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
CUDA call was originally invoked at:
[' File "/root/llama-dl/llama/example.py", line 7, in <module>\n import torch\n', ' File "<frozen importlib._bootstrap>", line 1027, in _find_and_load\n', ' File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked\n', ' File "<frozen importlib._bootst$
Traceback (most recent call last):
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 242, in _lazy_init
queued_call()
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 125, in _check_capability
capability = get_device_capability(d)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 357, in get_device_capability
prop = get_device_properties(device)
File "/root/anaconda3/envs/ENVNAME/lib/python3.10/site-packages/torch/cuda/__init__.py", line 375, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
nvidia-smi output -
Sun Mar 5 15:49:22 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:07:00.0 Off | 0 |
| N/A 29C P0 70W / 400W | 353MiB / 81920MiB | 9% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:0A:00.0 Off | 0 |
| N/A 26C P0 62W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
Cuda isn't being initialized correctly because of pytorch. smi just shows your second gpu not being initialized due to the first issue, even though the models look compatible. Try updating cuda to 12.
Also, when you are running the initial torch command try
python -m torch.distributed.run --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model
instead of torchrun. Editing the mp values of course.
check https://github.com/juncongmoo/pyllama if you want to run it locally in a single GPU
If I set a specific graphics card id, there will be no error. export CUDA_VISIBLE_DEVICES=0