LLaMA-LoRA-Tuner
LLaMA-LoRA-Tuner copied to clipboard
Inference got error "probability tensor contains either `inf`, `nan` or element < 0" while using beams=2 with temperature=0.1
Using decapoda-research/llama-7b-hf, beams = 2 with temperature = 0.1:
Note: Error will not be shown if Stream Output is enabled. If Stream Output is enabled, it will just output nothing.
beams = 2 with temperature = 0.4 also got this error, however, beams = 2 with temperature = 0.5 will not.
(unhelpful-ai-v01-3)
Same on the trained LoRA model:
Traceback (most recent call last):
File "/content/llama_lora/llama_lora/lib/streaming_generation_utils.py", line 47, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/content/llama_lora/llama_lora/lib/inference.py", line 59, in generate_with_callback
generation_output = model.generate(**kwargs)
File "/usr/local/lib/python3.9/dist-packages/peft/peft_model.py", line 631, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 1562, in generate
return self.beam_sample(
File "/usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py", line 3187, in beam_sample
next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1020, in postprocess_data
if predictions[i] is components._Keywords.FINISHED_ITERATING:
IndexError: tuple index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1111, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1024, in postprocess_data
raise ValueError(
ValueError: Number of output components does not match number of values returned from from function do_inference
@paplorinc I think that's another error that I forgot to update the return value at some point of the do_inference function, causing Gradio to error at some chances. I noticed this a few days ago and fixed it on the main branch. Let me know if it still happens for you!