llama
llama copied to clipboard
Running example.py with error on single or two 16G V100
Hi everyone,
May I ask for the correct command running the example? As I trying to running 7B on single 16G V100 or 13B on two 16G V100. it always raise error as follow:
Traceback (most recent call last):
File "example.py", line 72, in <module>
fire.Fire(main)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
target=component.__name__)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example.py", line 62, in main
generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)
File "example.py", line 48, in load
model = Transformer(model_args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 211, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 184, in __init__
self.attention = Attention(args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 116, in __init__
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.77 GiB total capacity; 14.42 GiB already allocated; 131.38 MiB free; 14.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "example.py", line 72, in <module>
fire.Fire(main)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
target=component.__name__)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example.py", line 62, in main
generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)
File "example.py", line 48, in load
model = Transformer(model_args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 211, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 184, in __init__
self.attention = Attention(args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 116, in __init__
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 1; 15.77 GiB total capacity; 14.42 GiB already allocated; 131.38 MiB free; 14.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5125) of binary: /mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/bin/python
Traceback (most recent call last):
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 766, in <module>
main()
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run
)(*cmd_args)
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Here is my command:
For 7B model on single GPU:
torchrun --nproc_per_node 1 example.py --ckpt_dir ./llama/7B --tokenizer_path ./tokenizer.model
For 13B model on two GPU:
python -m torch.distributed.run --nproc_per_node 2 example.py --ckpt_dir ./llama/13B --tokenizer_path ./tokenizer.model
I understand v100 may raise the error is "our of memory" , but at least now it looks not the main reason
Many thanks for help!!
You can try this hacked llama: https://github.com/juncongmoo/pyllama
It seems that you don't have enough VRAM. You can try https://github.com/galatolofederico/vanilla-llama
you can try this simple hacked llama and consume 14g VRAM in fp16 : https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py
Reduce the batch size should be helpful here. Re-open as needed.