llama icon indicating copy to clipboard operation
llama copied to clipboard

Error running `example_chat_completion.py` on `llama-2-7b-chat`

Open krsnnik opened this issue 2 years ago • 14 comments

python 3.8 PyPi running on a nvidia rtx 3900

torchrun --nproc_per_node 1 example_chat_completion.py     --ckpt_dir llama-2-7b-chat/     --tokenizer_path tokenizer.model     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 9.42 seconds
Traceback (most recent call last):
  File "example_chat_completion.py", line 73, in <module>
    fire.Fire(main)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example_chat_completion.py", line 56, in main
    results = generator.chat_completion(
  File "/home/kliu/Workspace/llama/llama/generation.py", line 270, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 146, in generate
    next_token = sample_top_p(probs, top_p)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 301, in sample_top_p
    next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 155743) of binary: /home/kliu/Workspace/llama/env/bin/python3
Traceback (most recent call last):
  File "/home/kliu/Workspace/llama/env/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example_chat_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-19_14:51:37
  host      : eleusis
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 155743)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

krsnnik avatar Jul 19 '23 22:07 krsnnik

I have the same issue. I tried reducing the batch_size, but it's not helping.

zhpinkman avatar Jul 19 '23 23:07 zhpinkman

I have the same issue.

$ pip install -e .

$ torchrun --nproc_per_node 1 example_chat_completion.py \                              at  08:09:59
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

$ torchrun --nproc_per_node 1 example_text_completion.py \                        ✘ INT at  08:12:29
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

jonsoku-dev avatar Jul 19 '23 23:07 jonsoku-dev

I could fix my issue using lower max_seq_len. hope this helps.

zhpinkman avatar Jul 19 '23 23:07 zhpinkman

zhpinkman

Thank you! what was your set max_seq_len ?

it is also occured error..

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 10 --max_batch_size 4

ghost avatar Jul 19 '23 23:07 ghost

I was using 512, which was throwing the error; with 256, it's working fine. Also, note that you can limit the number of prompts you have in the input. In the default template, there are four prompts if I'm correct. You can reduce that to only one example if you have a smaller GPU. The whole point of the error is batches that cannot be fitted on GPU, so playing around with mentioned parameters can help prevent the issue.

zhpinkman avatar Jul 19 '23 23:07 zhpinkman

Thank you. but It doesn't work for me :( There seems to be a lot of related issues, so I'm watching this issue..!

ghost avatar Jul 20 '23 01:07 ghost

same error, and reduce max_seq_len to 128 not work.

gucaslyz avatar Jul 21 '23 00:07 gucaslyz

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

  1. download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
  2. download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
  3. run the download.sh script in a terminal, passing the URL provided when prompted to start the download
  4. go to the llama-main folder
  5. cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
  6. install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
  7. install dependencies off llama : python3 -m pip install -e .
  8. run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

pzim-devdata avatar Jul 25 '23 14:07 pzim-devdata

Tried 128 as well and did not work, Also tried to reduce max_batch_size down to 1, also did not work, same RuntimeError: probability tensor contains either inf, nan or element < 0 error

krsnnik avatar Jul 26 '23 22:07 krsnnik

Running into the same error. Tried changing batch size and max_seq_len but neither worked

nisargjoshi10 avatar Aug 09 '23 13:08 nisargjoshi10

Increasing the max_batch_size to >4 works. I set it to 6 and it works. torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1

sthreepi avatar Aug 30 '23 06:08 sthreepi

I've solved this error by setting the “max_batch_size” to a multiple of the number of prompts

maowenyu-11 avatar Oct 13 '23 12:10 maowenyu-11

Same error here, nothing seems to work

XanderDevelops avatar Nov 21 '23 01:11 XanderDevelops

i trying to run Llama3 model 8B got this issue -

(llama3chatbot) C:\Users\prath\llama3-main>torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B/ \ --tokenizer_path tokenizer .model \ --max_seq_len 128 --max_batch_size 1 failed to create process.

it showing failed to process . whats the issue ? help!!

prathams177 avatar May 20 '24 12:05 prathams177