MiniGPT-4 icon indicating copy to clipboard operation
MiniGPT-4 copied to clipboard

Probability tensor contains either `inf`, `nan` or element < 0

Open atxcowboy opened this issue 1 year ago • 25 comments

After uploading a sample photo and entering a first prompt it throws back this error:

Traceback (most recent call last):
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "E:\MiniGPT-4\demo.py", line 95, in gradio_answer
    llm_message = chat.answer(conv=chat_state,
  File "E:\MiniGPT-4\minigpt4\conversation\conversation.py", line 150, in answer
    outputs = self.model.llama_model.generate(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\transformers\generation\utils.py", line 1636, in generate
    return self.beam_sample(
  File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\transformers\generation\utils.py", line 3261, in beam_sample
    next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

atxcowboy avatar Apr 20 '23 15:04 atxcowboy

i have same problem, please ask any ideas?

WakingHours-GitHub avatar Apr 22 '23 08:04 WakingHours-GitHub

same problem

root@c5e9e1ead265:/workspace# pip list | grep torch pytorch-quantization 2.1.2 torch 1.12.1 torch-tensorrt 1.2.0 torchaudio 0.12.1 torchtext 0.13.0a0+fae8e8c torchvision 0.13.1

setting low_resource to False in the config file [minigpt4_eval.yaml](https://github.com/Vision-CAIR/MiniGPT-4/blob/main/eval_configs/minigpt4_eval.yaml) and use a larger beam search width.

The problem has been resolved,thanks!!!

gpu memory up to 18G,use v100-32G

cauherk avatar Apr 22 '23 12:04 cauherk

no, it doesn't work, i have setted low_resource to True, and the parameter of num_been is 1. but it still crash when i run it.

WakingHours-GitHub avatar Apr 23 '23 01:04 WakingHours-GitHub

Setting low_resource to False on 24G3090 will not enough GPU memory. Are there any other solutions?

flowerzero avatar Apr 23 '23 02:04 flowerzero

@Ph0rk0z hello, where to set "use a larger beam search width", could you give me some guidance? thank you!

HonestyBrave avatar Apr 24 '23 13:04 HonestyBrave

@atxcowboy hello, have you solved this question? i meet same problem

HonestyBrave avatar Apr 25 '23 01:04 HonestyBrave

i have meet the same problem with RTX8000 GPU

Lanceliyp avatar Apr 25 '23 03:04 Lanceliyp

I also have the same issues, anyone can help with this?

ericwang915 avatar Apr 25 '23 08:04 ericwang915

@flowerzero I have the 24G3090 as well. @HonestyBrave I have been able to sort the problem in my case by downloading fresh LLAMA-7B and LLAMA-13B weights from Huggingface. So the weights I had found elsewhere may have contributed to that error? Also, I had found a different environment.yml. I will check the differences and post them when I am back at that computer.

atxcowboy avatar Apr 25 '23 08:04 atxcowboy

can you paste the link here?

ericwang915 avatar Apr 25 '23 08:04 ericwang915

@Ph0rk0z hello, where to set "use a larger beam search width", could you give me some guidance? thank you! from my views, i also don't see beam width, but when run this programm, in the index page, you can set num of beam. This is my humble opinion

WakingHours-GitHub avatar Apr 25 '23 09:04 WakingHours-GitHub

我也有 24G3090。 我已经能够通过从Huggingface下载新的LLAMA-7B和LLAMA-13B砝码来解决问题。所以我在其他地方找到的重量可能是导致这个错误的原因吗?另外,我发现了一个不同的环境.yml。我将检查差异并在回到该计算机时发布它们。

what version? v1.1? i'm same to

WakingHours-GitHub avatar Apr 25 '23 09:04 WakingHours-GitHub

This is the environment.yml that helped me. It has changes for cuda and bitsandbytes.

name: minigpt4
channels:
  - pytorch
  - nvidia/label/cuda-11.8.0
  - defaults
  - anaconda
dependencies:
  - python=3.9
  - cuda-toolkit
  - pip
  - pytorch[version=2,build=py*_cuda*]
  - pytorch-mutex=1.0=cuda
  - torchaudio
  - torchvision
  - pip:
    - accelerate==0.16.0
    - aiohttp==3.8.4
    - aiosignal==1.3.1
    - async-timeout==4.0.2
    - attrs==22.2.0
    - https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl
    - cchardet==2.1.7
    - chardet==5.1.0
    - contourpy==1.0.7
    - cycler==0.11.0
    - filelock==3.9.0
    - fonttools==4.38.0
    - frozenlist==1.3.3
    - huggingface-hub==0.13.4
    - importlib-resources==5.12.0
    - kiwisolver==1.4.4
    - matplotlib==3.7.0
    - multidict==6.0.4
    - openai==0.27.0
    - packaging==23.0
    - psutil==5.9.4
    - pycocotools==2.0.6
    - pyparsing==3.0.9
    - python-dateutil==2.8.2
    - pyyaml==6.0
    - regex==2022.10.31
    - tokenizers==0.13.2
    - tqdm==4.64.1
    - transformers==4.28.0
    - timm==0.6.13
    - spacy==3.5.1
    - webdataset==0.2.48
    - scikit-learn==1.2.2
    - scipy==1.10.1
    - yarl==1.8.2
    - zipp==3.14.0
    - omegaconf==2.3.0
    - opencv-python==4.7.0.72
    - iopath==0.1.10
    - decord==0.6.0
    - tenacity==8.2.2
    - peft
    - pycocoevalcap
    - sentence-transformers
    - umap-learn
    - notebook
    - gradio==3.24.1
    - gradio-client==0.0.8
    - wandb

atxcowboy avatar Apr 25 '23 10:04 atxcowboy

@atxcowboy, hi, i use your environment.yaml, but have same problem i want ask, in your LLAMA-13B directory, how many files in there? in my llama-13b-hf directory i have 51 files such as: image

HonestyBrave avatar Apr 26 '23 02:04 HonestyBrave

I have the same error RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 for 7b model, but it works well for 13b. Any idea?

lovecambi avatar May 01 '23 04:05 lovecambi

I have the same error RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 for 7b model, but it works well for 13b. Any idea?

@lovecambi ,hi, your 13b directory have which files, could you screenshot the image?

HonestyBrave avatar May 04 '23 01:05 HonestyBrave

I have the same error RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 for 7b model, but it works well for 13b. Any idea?

@lovecambi ,hi, your 13b directory have which files, could you screenshot the image?

https://huggingface.co/huggyllama/llama-13b

lovecambi avatar May 04 '23 01:05 lovecambi

@flowerzero I have the 24G3090 as well. @HonestyBrave I have been able to sort the problem in my case by downloading fresh LLAMA-7B and LLAMA-13B weights from Huggingface. So the weights I had found elsewhere may have contributed to that error? Also, I had found a different environment.yml. I will check the differences and post them when I am back at that computer.

Can you please paste the download link of LLAMA-7B and LLAMA-13B weights? thanks

birchmi avatar May 05 '23 02:05 birchmi

I used the 'wangrongsheng/MiniGPT-4-LLaMA-7B' weights in my case. I noticed that the 'hidden_states' have a '-inf' value when counting the first 'next_token' in the 30th transformer block. I filtered out the error and it now works.

You can find the code responsible for this issue at: https://github.com/Vision-CAIR/MiniGPT-4/blob/main/minigpt4/models/modeling_llama.py#L573

Here's the snippet of the code I used to address the problem(add them after the mentioned line):

if hidden_states.isinf().sum() > 0.:
    inf_idx = torch.where(hidden_states.isinf())
    print("Warning: 'hidden_states' have 'inf' values at", inf_idx)
    hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.

adot08 avatar May 09 '23 03:05 adot08

change to v1.1, if you are trying with v0

bigbrother001 avatar Jul 07 '23 02:07 bigbrother001

I didn't have this problem at runtime the other day, but now I'm having this problem. it's so wired Does anyone have any ideas?

double-fire-0 avatar Aug 01 '23 09:08 double-fire-0

在我的例子中,我使用了“wangrongsheng/MiniGPT-4-LLaMA-7B”权重。我注意到在计算第 30 个变压器块中的第一个“next_token”时,“hidden_​​states”有一个“-inf”值。我过滤掉了错误,现在它可以工作了。

您可以在以下位置找到导致此问题的代码:https://github.com/Vision-CAIR/MiniGPT-4/blob/main/minigpt4/models/modeling_llama.py#L573

这是我用来解决问题的代码片段(将它们添加到上述行之后):

if hidden_states.isinf().sum() > 0.:
    inf_idx = torch.where(hidden_states.isinf())
    print("Warning: 'hidden_states' have 'inf' values at", inf_idx)
    hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.

咦,为啥我的好像不行

LHshooter avatar Aug 06 '23 08:08 LHshooter

if hidden_states.isinf().sum() > 0.: inf_idx = torch.where(hidden_states.isinf()) print("Warning: 'hidden_states' have 'inf' values at", inf_idx) hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.

This doesnt work. Do we have to mention after line ' hidden_states = outputs[0]'

JaswaniSaniya avatar Mar 19 '24 06:03 JaswaniSaniya