llama icon indicating copy to clipboard operation
llama copied to clipboard

Running example.py with error on single or two 16G V100

Open HaoBytes opened this issue 1 year ago • 3 comments

Hi everyone,

May I ask for the correct command running the example? As I trying to running 7B on single 16G V100 or 13B on two 16G V100. it always raise error as follow:

Traceback (most recent call last):
  File "example.py", line 72, in <module>
    fire.Fire(main)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
    target=component.__name__)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example.py", line 62, in main
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)
  File "example.py", line 48, in load
    model = Transformer(model_args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 211, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 184, in __init__
    self.attention = Attention(args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 116, in __init__
    (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 15.77 GiB total capacity; 14.42 GiB already allocated; 131.38 MiB free; 14.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "example.py", line 72, in <module>
    fire.Fire(main)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
    target=component.__name__)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example.py", line 62, in main
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size)
  File "example.py", line 48, in load
    model = Transformer(model_args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 211, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 184, in __init__
    self.attention = Attention(args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/llama-main/llama/model.py", line 116, in __init__
    (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 1; 15.77 GiB total capacity; 14.42 GiB already allocated; 131.38 MiB free; 14.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5125) of binary: /mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/bin/python
Traceback (most recent call last):
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 766, in <module>
    main()
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/run.py", line 756, in run
    )(*cmd_args)
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mnt/iusers01/fatpou01/compsci01/m32815hl/.conda/envs/llama/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 248, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Here is my command: For 7B model on single GPU: torchrun --nproc_per_node 1 example.py --ckpt_dir ./llama/7B --tokenizer_path ./tokenizer.model For 13B model on two GPU: python -m torch.distributed.run --nproc_per_node 2 example.py --ckpt_dir ./llama/13B --tokenizer_path ./tokenizer.model

I understand v100 may raise the error is "our of memory" , but at least now it looks not the main reason

Many thanks for help!!

HaoBytes avatar Mar 10 '23 15:03 HaoBytes

You can try this hacked llama: https://github.com/juncongmoo/pyllama

shadowwalker2718 avatar Mar 10 '23 19:03 shadowwalker2718

It seems that you don't have enough VRAM. You can try https://github.com/galatolofederico/vanilla-llama

galatolofederico avatar Mar 11 '23 23:03 galatolofederico

you can try this simple hacked llama and consume 14g VRAM in fp16 : https://github.com/Tongjilibo/bert4torch/blob/master/examples/basic/basic_language_model_llama.py

Tongjilibo avatar Mar 17 '23 16:03 Tongjilibo

Reduce the batch size should be helpful here. Re-open as needed.

WuhanMonkey avatar Sep 06 '23 17:09 WuhanMonkey