llama
llama copied to clipboard
Error running `example_chat_completion.py` on `llama-2-7b-chat`
python 3.8 PyPi running on a nvidia rtx 3900
torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat/ --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 9.42 seconds
Traceback (most recent call last):
File "example_chat_completion.py", line 73, in <module>
fire.Fire(main)
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example_chat_completion.py", line 56, in main
results = generator.chat_completion(
File "/home/kliu/Workspace/llama/llama/generation.py", line 270, in chat_completion
generation_tokens, generation_logprobs = self.generate(
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/kliu/Workspace/llama/llama/generation.py", line 146, in generate
next_token = sample_top_p(probs, top_p)
File "/home/kliu/Workspace/llama/llama/generation.py", line 301, in sample_top_p
next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 155743) of binary: /home/kliu/Workspace/llama/env/bin/python3
Traceback (most recent call last):
File "/home/kliu/Workspace/llama/env/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/kliu/Workspace/llama/env/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_chat_completion.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-07-19_14:51:37
host : eleusis
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 155743)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I have the same issue. I tried reducing the batch_size, but it's not helping.
I have the same issue.
$ pip install -e .
$ torchrun --nproc_per_node 1 example_chat_completion.py \ at 08:09:59
--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 4
$ torchrun --nproc_per_node 1 example_text_completion.py \ ✘ INT at 08:12:29
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 4
I could fix my issue using lower max_seq_len. hope this helps.
Thank you! what was your set max_seq_len ?
it is also occured error..
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 10 --max_batch_size 4
I was using 512, which was throwing the error; with 256, it's working fine. Also, note that you can limit the number of prompts you have in the input. In the default template, there are four prompts if I'm correct. You can reduce that to only one example if you have a smaller GPU. The whole point of the error is batches that cannot be fitted on GPU, so playing around with mentioned parameters can help prevent the issue.
Thank you. but It doesn't work for me :( There seems to be a lot of related issues, so I'm watching this issue..!
same error, and reduce max_seq_len to 128 not work.
I have solved it with a cpu installation by installing this : https://github.com/krychu/llama instead of https://github.com/facebookresearch/llama
Complete process to install :
- download the original version of Llama from :
https://github.com/facebookresearch/llamaand extract it to allama-mainfolder - download th cpu version from :
https://github.com/krychu/llamaand extract it and replace files in thellama-mainfolder - run the
download.shscript in a terminal, passing the URL provided when prompted to start the download - go to the
llama-mainfolder - cretate an Python3 env :
python3 -m venv envand activate it :source env/bin/activate - install the cpu version of pytorch :
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu #pour la version cpu - install dependencies off llama :
python3 -m pip install -e . - run if you have downloaded llama-2-7b :
torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 1 #(instead of 4)
Tried 128 as well and did not work, Also tried to reduce max_batch_size down to 1, also did not work, same RuntimeError: probability tensor contains either inf, nan or element < 0 error
Running into the same error. Tried changing batch size and max_seq_len but neither worked
Increasing the max_batch_size to >4 works. I set it to 6 and it works.
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 1
I've solved this error by setting the “max_batch_size” to a multiple of the number of prompts
Same error here, nothing seems to work
i trying to run Llama3 model 8B got this issue -
(llama3chatbot) C:\Users\prath\llama3-main>torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B/ \ --tokenizer_path tokenizer .model \ --max_seq_len 128 --max_batch_size 1 failed to create process.
it showing failed to process . whats the issue ? help!!