MiniGPT-4
MiniGPT-4 copied to clipboard
Probability tensor contains either `inf`, `nan` or element < 0
After uploading a sample photo and entering a first prompt it throws back this error:
Traceback (most recent call last):
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "E:\MiniGPT-4\demo.py", line 95, in gradio_answer
llm_message = chat.answer(conv=chat_state,
File "E:\MiniGPT-4\minigpt4\conversation\conversation.py", line 150, in answer
outputs = self.model.llama_model.generate(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\transformers\generation\utils.py", line 1636, in generate
return self.beam_sample(
File "C:\Users\Sasch\.conda\envs\minigpt4\lib\site-packages\transformers\generation\utils.py", line 3261, in beam_sample
next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
i have same problem, please ask any ideas?
same problem
root@c5e9e1ead265:/workspace# pip list | grep torch pytorch-quantization 2.1.2 torch 1.12.1 torch-tensorrt 1.2.0 torchaudio 0.12.1 torchtext 0.13.0a0+fae8e8c torchvision 0.13.1
setting low_resource to False in the config file [minigpt4_eval.yaml](https://github.com/Vision-CAIR/MiniGPT-4/blob/main/eval_configs/minigpt4_eval.yaml) and use a larger beam search width.
The problem has been resolved,thanks!!!
gpu memory up to 18G,use v100-32G
no, it doesn't work, i have setted low_resource to True, and the parameter of num_been is 1. but it still crash when i run it.
Setting low_resource to False on 24G3090 will not enough GPU memory. Are there any other solutions?
@Ph0rk0z hello, where to set "use a larger beam search width", could you give me some guidance? thank you!
@atxcowboy hello, have you solved this question? i meet same problem
i have meet the same problem with RTX8000 GPU
I also have the same issues, anyone can help with this?
@flowerzero I have the 24G3090 as well. @HonestyBrave I have been able to sort the problem in my case by downloading fresh LLAMA-7B and LLAMA-13B weights from Huggingface. So the weights I had found elsewhere may have contributed to that error? Also, I had found a different environment.yml. I will check the differences and post them when I am back at that computer.
can you paste the link here?
@Ph0rk0z hello, where to set "use a larger beam search width", could you give me some guidance? thank you! from my views, i also don't see beam width, but when run this programm, in the index page, you can set num of beam. This is my humble opinion
我也有 24G3090。 我已经能够通过从Huggingface下载新的LLAMA-7B和LLAMA-13B砝码来解决问题。所以我在其他地方找到的重量可能是导致这个错误的原因吗?另外,我发现了一个不同的环境.yml。我将检查差异并在回到该计算机时发布它们。
what version? v1.1? i'm same to
This is the environment.yml that helped me. It has changes for cuda and bitsandbytes.
name: minigpt4
channels:
- pytorch
- nvidia/label/cuda-11.8.0
- defaults
- anaconda
dependencies:
- python=3.9
- cuda-toolkit
- pip
- pytorch[version=2,build=py*_cuda*]
- pytorch-mutex=1.0=cuda
- torchaudio
- torchvision
- pip:
- accelerate==0.16.0
- aiohttp==3.8.4
- aiosignal==1.3.1
- async-timeout==4.0.2
- attrs==22.2.0
- https://github.com/jllllll/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl
- cchardet==2.1.7
- chardet==5.1.0
- contourpy==1.0.7
- cycler==0.11.0
- filelock==3.9.0
- fonttools==4.38.0
- frozenlist==1.3.3
- huggingface-hub==0.13.4
- importlib-resources==5.12.0
- kiwisolver==1.4.4
- matplotlib==3.7.0
- multidict==6.0.4
- openai==0.27.0
- packaging==23.0
- psutil==5.9.4
- pycocotools==2.0.6
- pyparsing==3.0.9
- python-dateutil==2.8.2
- pyyaml==6.0
- regex==2022.10.31
- tokenizers==0.13.2
- tqdm==4.64.1
- transformers==4.28.0
- timm==0.6.13
- spacy==3.5.1
- webdataset==0.2.48
- scikit-learn==1.2.2
- scipy==1.10.1
- yarl==1.8.2
- zipp==3.14.0
- omegaconf==2.3.0
- opencv-python==4.7.0.72
- iopath==0.1.10
- decord==0.6.0
- tenacity==8.2.2
- peft
- pycocoevalcap
- sentence-transformers
- umap-learn
- notebook
- gradio==3.24.1
- gradio-client==0.0.8
- wandb
@atxcowboy, hi, i use your environment.yaml, but have same problem
i want ask, in your LLAMA-13B directory, how many files in there? in my llama-13b-hf directory i have 51 files
such as:
I have the same error RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
for 7b model, but it works well for 13b. Any idea?
I have the same error
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
for 7b model, but it works well for 13b. Any idea?
@lovecambi ,hi, your 13b directory have which files, could you screenshot the image?
I have the same error
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
for 7b model, but it works well for 13b. Any idea?@lovecambi ,hi, your 13b directory have which files, could you screenshot the image?
https://huggingface.co/huggyllama/llama-13b
@flowerzero I have the 24G3090 as well. @HonestyBrave I have been able to sort the problem in my case by downloading fresh LLAMA-7B and LLAMA-13B weights from Huggingface. So the weights I had found elsewhere may have contributed to that error? Also, I had found a different environment.yml. I will check the differences and post them when I am back at that computer.
Can you please paste the download link of LLAMA-7B and LLAMA-13B weights? thanks
I used the 'wangrongsheng/MiniGPT-4-LLaMA-7B' weights in my case. I noticed that the 'hidden_states' have a '-inf' value when counting the first 'next_token' in the 30th transformer block. I filtered out the error and it now works.
You can find the code responsible for this issue at: https://github.com/Vision-CAIR/MiniGPT-4/blob/main/minigpt4/models/modeling_llama.py#L573
Here's the snippet of the code I used to address the problem(add them after the mentioned line):
if hidden_states.isinf().sum() > 0.:
inf_idx = torch.where(hidden_states.isinf())
print("Warning: 'hidden_states' have 'inf' values at", inf_idx)
hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.
change to v1.1, if you are trying with v0
I didn't have this problem at runtime the other day, but now I'm having this problem. it's so wired Does anyone have any ideas?
在我的例子中,我使用了“wangrongsheng/MiniGPT-4-LLaMA-7B”权重。我注意到在计算第 30 个变压器块中的第一个“next_token”时,“hidden_states”有一个“-inf”值。我过滤掉了错误,现在它可以工作了。
您可以在以下位置找到导致此问题的代码:https://github.com/Vision-CAIR/MiniGPT-4/blob/main/minigpt4/models/modeling_llama.py#L573
这是我用来解决问题的代码片段(将它们添加到上述行之后):
if hidden_states.isinf().sum() > 0.: inf_idx = torch.where(hidden_states.isinf()) print("Warning: 'hidden_states' have 'inf' values at", inf_idx) hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.
咦,为啥我的好像不行
if hidden_states.isinf().sum() > 0.: inf_idx = torch.where(hidden_states.isinf()) print("Warning: 'hidden_states' have 'inf' values at", inf_idx) hidden_states[inf_idx[0], inf_idx[1], inf_idx[2]] = 0.
This doesnt work. Do we have to mention after line ' hidden_states = outputs[0]'