llama Error running `example_chat_completion.py` on `llama-2-7b-chat`

python 3.8 PyPi running on a nvidia rtx 3900

torchrun --nproc_per_node 1 example_chat_completion.py     --ckpt_dir llama-2-7b-chat/     --tokenizer_path tokenizer.model     --max_seq_len 512 --max_batch_size 4

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 9.42 seconds
Traceback (most recent call last):
  File "example_chat_completion.py", line 73, in <module>
    fire.Fire(main)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "example_chat_completion.py", line 56, in main
    results = generator.chat_completion(
  File "/home/kliu/Workspace/llama/llama/generation.py", line 270, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 146, in generate
    next_token = sample_top_p(probs, top_p)
  File "/home/kliu/Workspace/llama/llama/generation.py", line 301, in sample_top_p
    next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 155743) of binary: /home/kliu/Workspace/llama/env/bin/python3
Traceback (most recent call last):
  File "/home/kliu/Workspace/llama/env/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example_chat_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-19_14:51:37
  host      : eleusis
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 155743)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Jul 19 '23 22:07 krsnnik

I have the same issue. I tried reducing the batch_size, but it's not helping.

Jul 19 '23 23:07 zhpinkman

I have the same issue.

$ pip install -e .

$ torchrun --nproc_per_node 1 example_chat_completion.py \                              at  08:09:59
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

$ torchrun --nproc_per_node 1 example_text_completion.py \                        ✘ INT at  08:12:29
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

Jul 19 '23 23:07 jonsoku-dev

I could fix my issue using lower max_seq_len. hope this helps.

Jul 19 '23 23:07 zhpinkman

zhpinkman

Thank you! what was your set max_seq_len ?

it is also occured error..

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 10 --max_batch_size 4

Jul 19 '23 23:07 ghost

I was using 512, which was throwing the error; with 256, it's working fine. Also, note that you can limit the number of prompts you have in the input. In the default template, there are four prompts if I'm correct. You can reduce that to only one example if you have a smaller GPU. The whole point of the error is batches that cannot be fitted on GPU, so playing around with mentioned parameters can help prevent the issue.

Jul 19 '23 23:07 zhpinkman

Thank you. but It doesn't work for me :( There seems to be a lot of related issues, so I'm watching this issue..!

Jul 20 '23 01:07 ghost

same error, and reduce max_seq_len to 128 not work.

Jul 21 '23 00:07 gucaslyz

I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama Complete process to install :

download the original version of Llama from : https://github.com/facebookresearch/llama and extract it to a llama-main folder
download th cpu version from : https://github.com/krychu/llama and extract it and replace files in the llama-main folder
run the download.sh script in a terminal, passing the URL provided when prompted to start the download
go to the llama-main folder
cretate an Python3 env : python3 -m venv env and activate it : source env/bin/activate
install the cpu version of pytorch : python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu
install dependencies off llama : python3 -m pip install -e .
run if you have downloaded llama-2-7b :

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 1 #(instead of 4)

Jul 25 '23 14:07 pzim-devdata

Tried 128 as well and did not work, Also tried to reduce max_batch_size down to 1, also did not work, same RuntimeError: probability tensor contains either inf, nan or element < 0 error

Jul 26 '23 22:07 krsnnik

Running into the same error. Tried changing batch size and max_seq_len but neither worked

Aug 09 '23 13:08 nisargjoshi10

Increasing the max_batch_size to >4 works. I set it to 6 and it works. torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1

Aug 30 '23 06:08 sthreepi

I've solved this error by setting the “max_batch_size” to a multiple of the number of prompts

Oct 13 '23 12:10 maowenyu-11

Same error here, nothing seems to work

Nov 21 '23 01:11 XanderDevelops

i trying to run Llama3 model 8B got this issue -

(llama3chatbot) C:\Users\prath\llama3-main>torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B/ \ --tokenizer_path tokenizer .model \ --max_seq_len 128 --max_batch_size 1 failed to create process.

it showing failed to process . whats the issue ? help!!

May 20 '24 12:05 prathams177

llama llama copied to clipboard

Error running `example_chat_completion.py` on `llama-2-7b-chat`

llama
llama copied to clipboard