LLaVA [Usage] RuntimeError: probability tensor contains either `inf`, `nan` or element

Describe the issue

Issue: I pulled last commits from the repo, tried to run cli inference, the network is creating NaN as probability outputs.

Command:

python -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg"     --load-4bit

I also tried to create a fresh environment, still same bug. I explored the token in input to the network, and I noticed a strange -200 token. Not sure if this is causing the issue, maybe someone can have a look? I'm trying to debug it and come here if I have news!

Log:

Traceback (most recent call last):
  File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 125, in <module>
    main(args)
  File "/media/data/Riccardo/chat_with_OSM/LLaVA/llava/serve/cli.py", line 95, in main
    output_ids = model.generate(
  File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/riccardoricci/miniconda3/envs/chat_osm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Nov 09 '23 09:11 RicRicci22

+1 , I met the same problem here , I finetune llava with lora and want to inference it with

python -m llava.serve.cli --model-path /root/code/LLaVA/checkpoints/llava-v1.5-13b-lora --image-file /root/code/LLaVA/pic.png --model-base FlagAlpha/Llama2-Chinese-7b-Chat

And i also met RuntimeError: probability tensor contains either inf, nan or element < 0

Nov 09 '23 09:11 crazycth

Follow up, still I don't know the origin of the issue..

Tried to do input_ids = torch.abs(input_ids) before generate in cli.py.

Interesting fact: size of input embeddings (before line 90 in llava_llama.py) changes.

with torch.abs() input embeds -> torch.Size([1, 45, 4096])

without input embeds: torch.Size([1, 620, 4096])

Still don't know the issue..

Nov 09 '23 09:11 RicRicci22

Just discovered that token -200 is the image token.. So, my guess is that it is a tokenization problem!

Just look at the size of the input_embeds with the "correct tokens"

Nov 09 '23 10:11 RicRicci22

-200 is the image token and don't change that

Is the problem with weights ? have you tried lora and it works well?

Nov 09 '23 10:11 crazycth

No, the thing that makes me wonder is that I just cloned the repo, installed the packages and tried the cli inference.

It should be straightforward, but it is not in my case.. anyone else having the same problem?

My workstation has two gpus, and I encounter different behavior when doing ''' export CUDA_VISIBLE_DEVICES=0 ''' the error above

''' export CUDA_VISIBLE_DEVICES=0,1 '''

Error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Still I didn't find the solution..

Nov 09 '23 10:11 RicRicci22

Just discovered that the gradio interface works fine if I do export CUDA_VISIBLE_DEVICES=0 when starting the model worker. Otherwise it completely freezes the workstation.

Instead in cli.py, after doing export CUDA_VISIBLE_DEVICES=0, I receive the above mentioned error: RuntimeError: probability tensor contains either inf, nan or element < 0

Nov 09 '23 11:11 RicRicci22

I had to add a bunch of arguments (missing in the README) to the arguments object and it turns out I get this error when ”num_beams” > 1. Like this:

       "temperature": 0.2,
       "top_p": 0.5,
       "num_beams": 1,
       "max_new_tokens": 300,

Nov 15 '23 13:11 ntoxeg

@haotian-liu same question. Can you help us on it?

Apr 17 '24 09:04 waltonfuture

Okay. I think I solved it through change the dtype in cli.py, basically I changed dtype=torch.float16 to dtype=torch.bfloat16. I think you also can change it to float32. Also you need to change the model dtype. For me I added model.to(dtype=torch.bfloat16) but there are probably more elegant way to handle this.

The issue is probably related to the float16 causing overflow as the issue here

Apr 27 '24 02:04 user074

I have been facing the same issue.

Jul 30 '24 05:07 ghazalsaheb

Update: I was able to resolve the issue by changing the base model from hugging face's "llava-hf/llava-1.5-7b-hf"to "liuhaotian/llava-v1.5-7b". It resolved the NaN issue and the training performance got much better.

Aug 06 '24 19:08 ghazalsaheb

LLaVA
LLaVA copied to clipboard

[Usage] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Describe the issue

LLaVA LLaVA copied to clipboard

[Usage] RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Describe the issue

LLaVA
LLaVA copied to clipboard