ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Open xiamaozi11 opened this issue 1 year ago • 31 comments

xiamaozi11 avatar Mar 15 '23 02:03 xiamaozi11

Can you provide a reproducible example, including model precision, query, and history?

duzx16 avatar Mar 15 '23 08:03 duzx16

demo样例 from modeling_chatglm import ChatGLMForConditionalGeneration from tokenization_chatglm import ChatGLMTokenizer import gradio as gr

model_path= r'chat-GLM' #tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) #model = AutoModel.from_pretrained(model_path,trust_remote_code=True).half().cuda() tokenizer = ChatGLMTokenizer.from_pretrained(model_path, trust_remote_code=True) model = ChatGLMForConditionalGeneration.from_pretrained(model_path,trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[])

print(response)

File /home/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py:2479, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs) 2477 # sample 2478 probs = nn.functional.softmax(next_token_scores, dim=-1) -> 2479 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 2481 # finished sentences should have their next token be a padding token 2482 if eos_token_id is not None:

RuntimeError: probability tensor contains either inf, nan or element < 0

xiamaozi11 avatar Mar 15 '23 08:03 xiamaozi11

更新了一下Hugging Face Hub上的代码。可以现在再试一下

duzx16 avatar Mar 15 '23 13:03 duzx16

/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56

RuntimeError: value cannot be converted to type at::Half without overflow

xiamaozi11 avatar Mar 16 '23 01:03 xiamaozi11

/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56

RuntimeError: value cannot be converted to type at::Half without overflow

不好意思,没完全改对,现在再试一下呢

duzx16 avatar Mar 16 '23 01:03 duzx16

返回值为空

xiamaozi11 avatar Mar 16 '23 03:03 xiamaozi11

更新后,没有返回值了

xiamaozi11 avatar Mar 17 '23 09:03 xiamaozi11

response =’‘

xiamaozi11 avatar Mar 17 '23 09:03 xiamaozi11

response =’‘

@xiamaozi11 能说一下你的硬件,CUDA版本和PyTorch版本吗 会出现这个问题是因为在你的环境下模型算出来的都是nan

duzx16 avatar Mar 17 '23 09:03 duzx16

3090 cuda11.1 torch1.13.1,我这边好像两台3090都是这样子的

xiamaozi11 avatar Mar 17 '23 09:03 xiamaozi11

好像不是硬件问题,我换了个autodl的a40也是这样的问题

xiamaozi11 avatar Mar 20 '23 02:03 xiamaozi11

好像不是硬件问题,我换了个autodl的a40也是这样的问题

同样的问题,response空白,请问你解决了吗? @xiamaozi11

Jessense avatar Mar 27 '23 14:03 Jessense

You can pass this error by remove do_sample=True in model.generate

mrtrieuphong avatar Apr 03 '23 07:04 mrtrieuphong

这个问题有解决的么

AnddyWang avatar Apr 24 '23 02:04 AnddyWang

/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56 RuntimeError: value cannot be converted to type at::Half without overflow

不好意思,没完全改对,现在再试一下呢

这个怎么解决呢

lovelucymuch avatar May 10 '23 06:05 lovelucymuch

使用text-generator-webui装载模型,同样的错误信息

Traceback (most recent call last):
  File "/home/ecs-user/src/fnlp/text-generation-webui/modules/callbacks.py", line 73, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "/home/ecs-user/src/fnlp/text-generation-webui/modules/text_generation.py", line 259, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/home/c/envs/fnlp/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/c/envs/fnlp/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/home/c/envs/fnlp/lib/python3.10/site-packages/transformers/generation/utils.py", line 2560, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

开启了load-in-int8

CUDA_VISIBLE_DEVICES=0 \
NUMEXPR_MAX_THREADS=1 \
python server.py --chat \
--model THUDM/chatglm-6b \
--trust-remote-code \
--load-in-8bit  \
--api

不开load-in-8bit是正常的.

卡是V100, torch版本是2.0.1

accelerate               0.19.0
aiofiles                 23.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
altair                   5.0.0
anyio                    3.6.2
async-timeout            4.0.2
attrs                    23.1.0
bitsandbytes             0.38.1
certifi                  2023.5.7
charset-normalizer       3.1.0
click                    8.1.3
cmake                    3.26.3
colorama                 0.4.6
contourpy                1.0.7
cycler                   0.11.0
datasets                 2.12.0
dill                     0.3.6
fastapi                  0.95.1
ffmpy                    0.3.0
filelock                 3.12.0
flexgen                  0.1.7
fonttools                4.39.4
frozenlist               1.3.3
fsspec                   2023.5.0
gradio                   3.25.0
gradio_client            0.1.4
h11                      0.14.0
httpcore                 0.17.0
httpx                    0.24.0
huggingface-hub          0.14.1
idna                     3.4
Jinja2                   3.1.2
jsonschema               4.17.3
kiwisolver               1.4.4
linkify-it-py            2.0.2
lit                      16.0.3
llama-cpp-python         0.1.45
Markdown                 3.4.3
markdown-it-py           2.2.0
MarkupSafe               2.1.2
matplotlib               3.7.1
mdit-py-plugins          0.3.3
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.14
networkx                 3.1
numpy                    1.24.3
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
orjson                   3.8.12
packaging                23.1
pandas                   2.0.1
peft                     0.4.0.dev0
Pillow                   9.5.0
pip                      23.0.1
psutil                   5.9.5
PuLP                     2.7.0
pyarrow                  12.0.0
pydantic                 1.10.7
pydub                    0.25.1
pyparsing                3.0.9
pyrsistent               0.19.3
python-dateutil          2.8.2
python-multipart         0.0.6
pytz                     2023.3
PyYAML                   6.0
regex                    2023.5.5
requests                 2.30.0
responses                0.18.0
rwkv                     0.7.3
safetensors              0.3.1
semantic-version         2.10.0
sentencepiece            0.1.99
setuptools               66.0.0
six                      1.16.0
sniffio                  1.3.0
starlette                0.26.1
sympy                    1.12
tokenizers               0.13.3
toolz                    0.12.0
torch                    2.0.1
tqdm                     4.65.0
transformers             4.28.1
triton                   2.0.0
typing_extensions        4.5.0
tzdata                   2023.3
uc-micro-py              1.0.2
urllib3                  2.0.2
uvicorn                  0.22.0
websockets               11.0.3
wheel                    0.38.4
xxhash                   3.2.0

is avatar May 11 '23 05:05 is

huggingface-metadata

url: https://huggingface.co/THUDM/chatglm-6b
branch: main
download date: 2023-05-11 13:19:36
sha256sum:
    5e974d9a69c242ce014c88c2b26089270f6198f3c0b700a887666cd3e816f17e ice_text.model
    be79e2b22d99b3d76184f83f266cc764275220b66da6c4d0217176c8f8f6af27 pytorch_model-00001-of-00008.bin
    a80198fb714f7363d7e541125bb70b9cb6b1d1ef5988d32a7a25a852a374cbc3 pytorch_model-00002-of-00008.bin
    aaba0ae53b3ea30559575c8528dab52ca291a26ac847c5601fcf874db401198f pytorch_model-00003-of-00008.bin
    968d134dd9b11e393d160144f097d6bff8c559413e3f75e9e0b6d35618eba669 pytorch_model-00004-of-00008.bin
    fc628ce0dcd5c38783e63fc81dd1b609fe01670ec3b855b358aa0d1d7ea48bf3 pytorch_model-00005-of-00008.bin
    511ec23b7907b7a26461671775a2ac08c08fb3695285bbe7d91fc534d7cbfd7e pytorch_model-00006-of-00008.bin
    245d64e05cebeb214d696bccc87c1dbdf16c67c366e7f54af452ec5748c2186e pytorch_model-00007-of-00008.bin
    607d08dd09074840c5f4603d4959a5c6789790955181c7253a2c14d38c1801d2 pytorch_model-00008-of-00008.bin

is avatar May 11 '23 05:05 is

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 也还是不行.同样的错误,应该和torch版本没有直接的关系.

Package                  Version
------------------------ ------------
accelerate               0.19.0
aiofiles                 23.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
altair                   5.0.0
anyio                    3.6.2
async-timeout            4.0.2
attrs                    23.1.0
bitsandbytes             0.38.1
certifi                  2023.5.7
charset-normalizer       3.1.0
click                    8.1.3
cmake                    3.26.3
colorama                 0.4.6
contourpy                1.0.7
cycler                   0.11.0
datasets                 2.12.0
dill                     0.3.6
fastapi                  0.95.1
ffmpy                    0.3.0
filelock                 3.12.0
flexgen                  0.1.7
fonttools                4.39.4
frozenlist               1.3.3
fsspec                   2023.5.0
gradio                   3.25.0
gradio_client            0.1.4
h11                      0.14.0
httpcore                 0.17.0
httpx                    0.24.0
huggingface-hub          0.14.1
idna                     3.4
Jinja2                   3.1.2
jsonschema               4.17.3
kiwisolver               1.4.4
linkify-it-py            2.0.2
lit                      16.0.3
llama-cpp-python         0.1.45
Markdown                 3.4.3
markdown-it-py           2.2.0
MarkupSafe               2.1.2
matplotlib               3.7.1
mdit-py-plugins          0.3.3
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.14
networkx                 3.1
numpy                    1.24.3
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
orjson                   3.8.12
packaging                23.1
pandas                   2.0.1
peft                     0.3.0
Pillow                   9.5.0
pip                      23.0.1
psutil                   5.9.5
PuLP                     2.7.0
pyarrow                  12.0.0
pydantic                 1.10.7
pydub                    0.25.1
pyparsing                3.0.9
pyrsistent               0.19.3
python-dateutil          2.8.2
python-multipart         0.0.6
pytz                     2023.3
PyYAML                   6.0
regex                    2023.5.5
requests                 2.30.0
responses                0.18.0
rwkv                     0.7.3
safetensors              0.3.1
semantic-version         2.10.0
sentencepiece            0.1.99
setuptools               66.0.0
six                      1.16.0
sniffio                  1.3.0
starlette                0.26.1
sympy                    1.12
tokenizers               0.13.3
toolz                    0.12.0
torch                    1.13.1+cu117
torchaudio               0.13.1+cu117
torchvision              0.14.1+cu117
tqdm                     4.65.0
transformers             4.28.1
triton                   2.0.0
typing_extensions        4.5.0
tzdata                   2023.3
uc-micro-py              1.0.2
urllib3                  2.0.2
uvicorn                  0.22.0
websockets               11.0.3
wheel                    0.38.4
xxhash                   3.2.0
yarl                     1.9.2

is avatar May 11 '23 15:05 is

Is there a new udpate on this? I tried all solutions like do_sample=True, but it's so slow...

KLGR123 avatar May 16 '23 06:05 KLGR123

同样的问题

EmilyLong0721 avatar May 29 '23 02:05 EmilyLong0721

+1 也遇到这个问题

king21guns avatar Jun 02 '23 06:06 king21guns

do_sample

set 'per_device_eval_batch_size' a larger number

TMACchen1995 avatar Jun 07 '23 07:06 TMACchen1995

You probably forgot to call model.half(). I encountered the same error, after I added the model.half() invocation, the error disappeared. The example code in README.md contains the model.half() invocation, though I don't understand why this invocation is always necessary. 可能是由于没有调用 model.half(),我也遇到了这个错误,添加 model.half() 调用后可以了。README.md 里面的示例代码有 model.half(),我也不太明白为啥都得加上这个调用。

codingfun2022 avatar Jun 13 '23 03:06 codingfun2022

demo样例 from modeling_chatglm import ChatGLMForConditionalGeneration from tokenization_chatglm import ChatGLMTokenizer import gradio as gr

model_path= r'chat-GLM' #tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) #model = AutoModel.from_pretrained(model_path,trust_remote_code=True).half().cuda() tokenizer = ChatGLMTokenizer.from_pretrained(model_path, trust_remote_code=True) model = ChatGLMForConditionalGeneration.from_pretrained(model_path,trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[])

print(response)

File /home/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py:2479, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs) 2477 # sample 2478 probs = nn.functional.softmax(next_token_scores, dim=-1) -> 2479 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 2481 # finished sentences should have their next token be a padding token 2482 if eos_token_id is not None:

RuntimeError: probability tensor contains either inf, nan or element < 0

我的解决了,我是从外网下载模型到windows系统上,然后U盘拷贝到内网,再上传到服务器,中间不知道哪个环节出问题了,检查文件的md5sum发现不对,重新下载上传就好了。可能是模型文件的问题,建议检查一下

rufeng-h avatar Aug 16 '23 00:08 rufeng-h

不开load-in-8bit是正常的.

有可能是模型量化带来的问题,https://huggingface.co/databricks/dolly-v2-12b/discussions/77

SCZwangxiao avatar Sep 07 '23 01:09 SCZwangxiao

我的解决方式是调整temperature参数(chat()或者generate()接口)。原先temperature=0.1会报错,调整到temperature=0.95就不会有问题了。

推测应该是temperature太低导致logits数值溢出。


My solution is to adjust the temperature parameter (in the chat() or generate() interface). Initially, there was an error when temperature=0.1, but adjusting it to temperature=0.95 resolved the issue.

It is speculated that the error was caused by the temperature being too low, leading to overflow in the logits' values.

SCZwangxiao avatar Sep 07 '23 01:09 SCZwangxiao

mark

datalee avatar Sep 19 '23 03:09 datalee

mark

liugs0213 avatar Sep 25 '23 04:09 liugs0213

do_sample 做样

set 'per_device_eval_batch_size' a larger number将“per_device_eval_batch_size”设置为更大的数字

It works.

ljhOfGithub avatar Nov 07 '23 07:11 ljhOfGithub

错误发生在这里: python3.10/site-packages/transformers/generation/utils.py

probs = nn.functional.softmax(next_token_scores, dim=-1)
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)

因为probs出现nan,可能是softmaxnext_token_scores出现inf的缘故,追朔至

            next_token_scores = logits_processor(input_ids, next_token_logits)
            next_token_scores = logits_warper(input_ids, next_token_scores)

可知next_token_scoreslogits_processor处理的结果。在model.generate()中有这样一个参数:logits_processor,按下面的方式定义一个并传入即可:

from transformers.generation.logits_process import LogitsProcessor, LogitsProcessorList
from transformers.generation.logits_process import InfNanRemoveLogitsProcessor, MinLengthLogitsProcessor

logits_processor = LogitsProcessorList()
logits_processor.append(MinLengthLogitsProcessor(15, eos_token_id=tokenizer.eos_token))
logits_processor.append(InfNanRemoveLogitsProcessor())

其中MinLengthLogitsProcessor是默认的logits_processor,而新加的InfNanRemoveLogitsProcessor可以把next_token_logits中的nan转化为0,把inf转换为一个可以处理的最大数。 可以参考: https://huggingface.co/transformers/v4.9.2/_modules/transformers/generation_logits_process.html

bexby avatar Mar 10 '24 15:03 bexby