LLaVA
LLaVA copied to clipboard
[Usage] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Describe the issue
Issue: I pulled last commits from the repo, tried to run cli inference, the network is creating NaN as probability outputs.
Command:
python -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --load-4bit
I also tried to create a fresh environment, still same bug. I explored the token in input to the network, and I noticed a strange -200 token. Not sure if this is causing the issue, maybe someone can have a look? I'm trying to debug it and come here if I have news!
Log:
Traceback (most recent call last):
File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 125, in <module>
main(args)
File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 95, in main
output_ids = model.generate(
File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
return self.sample(
File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
+1 , I met the same problem here , I finetune llava with lora and want to inference it with
python -m llava.serve.cli --model-path /root/code/LLaVA/checkpoints/llava-v1.5-13b-lora --image-file /root/code/LLaVA/pic.png --model-base FlagAlpha/Llama2-Chinese-7b-Chat
And i also met RuntimeError: probability tensor contains either inf
, nan
or element < 0
Follow up, still I don't know the origin of the issue..
Tried to do input_ids = torch.abs(input_ids) before generate in cli.py.
Interesting fact: size of input embeddings (before line 90 in llava_llama.py) changes.
with torch.abs() input embeds -> torch.Size([1, 45, 4096])
without input embeds: torch.Size([1, 620, 4096])
Still don't know the issue..
Just discovered that token -200 is the image token.. So, my guess is that it is a tokenization problem!
Just look at the size of the input_embeds with the "correct tokens"
-200 is the image token and don't change that
Is the problem with weights ? have you tried lora and it works well?
No, the thing that makes me wonder is that I just cloned the repo, installed the packages and tried the cli inference.
It should be straightforward, but it is not in my case.. anyone else having the same problem?
My workstation has two gpus, and I encounter different behavior when doing ''' export CUDA_VISIBLE_DEVICES=0 ''' the error above
''' export CUDA_VISIBLE_DEVICES=0,1 '''
Error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Still I didn't find the solution..
Just discovered that the gradio interface works fine if I do export CUDA_VISIBLE_DEVICES=0 when starting the model worker. Otherwise it completely freezes the workstation.
Instead in cli.py, after doing export CUDA_VISIBLE_DEVICES=0, I receive the above mentioned error:
RuntimeError: probability tensor contains either inf
, nan
or element < 0
I had to add a bunch of arguments (missing in the README) to the arguments object and it turns out I get this error when ”num_beams” > 1
. Like this:
"temperature": 0.2,
"top_p": 0.5,
"num_beams": 1,
"max_new_tokens": 300,
@haotian-liu same question. Can you help us on it?
Okay. I think I solved it through change the dtype in cli.py, basically I changed dtype=torch.float16
to dtype=torch.bfloat16
. I think you also can change it to float32. Also you need to change the model dtype. For me I added model.to(dtype=torch.bfloat16)
but there are probably more elegant way to handle this.
The issue is probably related to the float16 causing overflow as the issue here
I have been facing the same issue.
Update: I was able to resolve the issue by changing the base model from hugging face's "llava-hf/llava-1.5-7b-hf"to "liuhaotian/llava-v1.5-7b". It resolved the NaN issue and the training performance got much better.