ChatGLM-6B
ChatGLM-6B copied to clipboard
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Can you provide a reproducible example, including model precision, query, and history?
demo样例 from modeling_chatglm import ChatGLMForConditionalGeneration from tokenization_chatglm import ChatGLMTokenizer import gradio as gr
model_path= r'chat-GLM' #tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) #model = AutoModel.from_pretrained(model_path,trust_remote_code=True).half().cuda() tokenizer = ChatGLMTokenizer.from_pretrained(model_path, trust_remote_code=True) model = ChatGLMForConditionalGeneration.from_pretrained(model_path,trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[])
print(response)
File /home/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py:2479, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs) 2477 # sample 2478 probs = nn.functional.softmax(next_token_scores, dim=-1) -> 2479 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 2481 # finished sentences should have their next token be a padding token 2482 if eos_token_id is not None:
RuntimeError: probability tensor contains either inf
, nan
or element < 0
更新了一下Hugging Face Hub上的代码。可以现在再试一下
/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56
RuntimeError: value cannot be converted to type at::Half without overflow
/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56
RuntimeError: value cannot be converted to type at::Half without overflow
不好意思,没完全改对,现在再试一下呢
返回值为空
更新后,没有返回值了
response =’‘
response =’‘
@xiamaozi11 能说一下你的硬件,CUDA版本和PyTorch版本吗 会出现这个问题是因为在你的环境下模型算出来的都是nan
3090 cuda11.1 torch1.13.1,我这边好像两台3090都是这样子的
好像不是硬件问题,我换了个autodl的a40也是这样的问题
好像不是硬件问题,我换了个autodl的a40也是这样的问题
同样的问题,response空白,请问你解决了吗? @xiamaozi11
You can pass this error by remove do_sample=True
in model.generate
这个问题有解决的么
/data1/data/data/xia/2023/chat_glm/modeling_chatglm.py in call(self, input_ids, scores) 52 if torch.isnan(scores).any() or torch.isinf(scores).any(): 53 scores.zero_() ---> 54 scores[..., 20005] = 1e5 55 return scores 56 RuntimeError: value cannot be converted to type at::Half without overflow
不好意思,没完全改对,现在再试一下呢
这个怎么解决呢
使用text-generator-webui装载模型,同样的错误信息
Traceback (most recent call last):
File "/home/ecs-user/src/fnlp/text-generation-webui/modules/callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/home/ecs-user/src/fnlp/text-generation-webui/modules/text_generation.py", line 259, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/c/envs/fnlp/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/c/envs/fnlp/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/home/c/envs/fnlp/lib/python3.10/site-packages/transformers/generation/utils.py", line 2560, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
开启了load-in-int8
CUDA_VISIBLE_DEVICES=0 \
NUMEXPR_MAX_THREADS=1 \
python server.py --chat \
--model THUDM/chatglm-6b \
--trust-remote-code \
--load-in-8bit \
--api
不开load-in-8bit是正常的.
卡是V100, torch版本是2.0.1
accelerate 0.19.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.0
anyio 3.6.2
async-timeout 4.0.2
attrs 23.1.0
bitsandbytes 0.38.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.3
colorama 0.4.6
contourpy 1.0.7
cycler 0.11.0
datasets 2.12.0
dill 0.3.6
fastapi 0.95.1
ffmpy 0.3.0
filelock 3.12.0
flexgen 0.1.7
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
gradio 3.25.0
gradio_client 0.1.4
h11 0.14.0
httpcore 0.17.0
httpx 0.24.0
huggingface-hub 0.14.1
idna 3.4
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
linkify-it-py 2.0.2
lit 16.0.3
llama-cpp-python 0.1.45
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
networkx 3.1
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
orjson 3.8.12
packaging 23.1
pandas 2.0.1
peft 0.4.0.dev0
Pillow 9.5.0
pip 23.0.1
psutil 5.9.5
PuLP 2.7.0
pyarrow 12.0.0
pydantic 1.10.7
pydub 0.25.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyYAML 6.0
regex 2023.5.5
requests 2.30.0
responses 0.18.0
rwkv 0.7.3
safetensors 0.3.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 66.0.0
six 1.16.0
sniffio 1.3.0
starlette 0.26.1
sympy 1.12
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1
tqdm 4.65.0
transformers 4.28.1
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 2.0.2
uvicorn 0.22.0
websockets 11.0.3
wheel 0.38.4
xxhash 3.2.0
huggingface-metadata
url: https://huggingface.co/THUDM/chatglm-6b
branch: main
download date: 2023-05-11 13:19:36
sha256sum:
5e974d9a69c242ce014c88c2b26089270f6198f3c0b700a887666cd3e816f17e ice_text.model
be79e2b22d99b3d76184f83f266cc764275220b66da6c4d0217176c8f8f6af27 pytorch_model-00001-of-00008.bin
a80198fb714f7363d7e541125bb70b9cb6b1d1ef5988d32a7a25a852a374cbc3 pytorch_model-00002-of-00008.bin
aaba0ae53b3ea30559575c8528dab52ca291a26ac847c5601fcf874db401198f pytorch_model-00003-of-00008.bin
968d134dd9b11e393d160144f097d6bff8c559413e3f75e9e0b6d35618eba669 pytorch_model-00004-of-00008.bin
fc628ce0dcd5c38783e63fc81dd1b609fe01670ec3b855b358aa0d1d7ea48bf3 pytorch_model-00005-of-00008.bin
511ec23b7907b7a26461671775a2ac08c08fb3695285bbe7d91fc534d7cbfd7e pytorch_model-00006-of-00008.bin
245d64e05cebeb214d696bccc87c1dbdf16c67c366e7f54af452ec5748c2186e pytorch_model-00007-of-00008.bin
607d08dd09074840c5f4603d4959a5c6789790955181c7253a2c14d38c1801d2 pytorch_model-00008-of-00008.bin
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 也还是不行.同样的错误,应该和torch版本没有直接的关系.
Package Version
------------------------ ------------
accelerate 0.19.0
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
altair 5.0.0
anyio 3.6.2
async-timeout 4.0.2
attrs 23.1.0
bitsandbytes 0.38.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.3
colorama 0.4.6
contourpy 1.0.7
cycler 0.11.0
datasets 2.12.0
dill 0.3.6
fastapi 0.95.1
ffmpy 0.3.0
filelock 3.12.0
flexgen 0.1.7
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
gradio 3.25.0
gradio_client 0.1.4
h11 0.14.0
httpcore 0.17.0
httpx 0.24.0
huggingface-hub 0.14.1
idna 3.4
Jinja2 3.1.2
jsonschema 4.17.3
kiwisolver 1.4.4
linkify-it-py 2.0.2
lit 16.0.3
llama-cpp-python 0.1.45
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdit-py-plugins 0.3.3
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
networkx 3.1
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
orjson 3.8.12
packaging 23.1
pandas 2.0.1
peft 0.3.0
Pillow 9.5.0
pip 23.0.1
psutil 5.9.5
PuLP 2.7.0
pyarrow 12.0.0
pydantic 1.10.7
pydub 0.25.1
pyparsing 3.0.9
pyrsistent 0.19.3
python-dateutil 2.8.2
python-multipart 0.0.6
pytz 2023.3
PyYAML 6.0
regex 2023.5.5
requests 2.30.0
responses 0.18.0
rwkv 0.7.3
safetensors 0.3.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 66.0.0
six 1.16.0
sniffio 1.3.0
starlette 0.26.1
sympy 1.12
tokenizers 0.13.3
toolz 0.12.0
torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117
tqdm 4.65.0
transformers 4.28.1
triton 2.0.0
typing_extensions 4.5.0
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 2.0.2
uvicorn 0.22.0
websockets 11.0.3
wheel 0.38.4
xxhash 3.2.0
yarl 1.9.2
Is there a new udpate on this?
I tried all solutions like do_sample=True
, but it's so slow...
同样的问题
+1 也遇到这个问题
do_sample
set 'per_device_eval_batch_size' a larger number
You probably forgot to call model.half(). I encountered the same error, after I added the model.half() invocation, the error disappeared. The example code in README.md contains the model.half() invocation, though I don't understand why this invocation is always necessary. 可能是由于没有调用 model.half(),我也遇到了这个错误,添加 model.half() 调用后可以了。README.md 里面的示例代码有 model.half(),我也不太明白为啥都得加上这个调用。
demo样例 from modeling_chatglm import ChatGLMForConditionalGeneration from tokenization_chatglm import ChatGLMTokenizer import gradio as gr
model_path= r'chat-GLM' #tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) #model = AutoModel.from_pretrained(model_path,trust_remote_code=True).half().cuda() tokenizer = ChatGLMTokenizer.from_pretrained(model_path, trust_remote_code=True) model = ChatGLMForConditionalGeneration.from_pretrained(model_path,trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[])
print(response)
File /home/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py:2479, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs) 2477 # sample 2478 probs = nn.functional.softmax(next_token_scores, dim=-1) -> 2479 next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) 2481 # finished sentences should have their next token be a padding token 2482 if eos_token_id is not None:
RuntimeError: probability tensor contains either
inf
,nan
or element < 0
我的解决了,我是从外网下载模型到windows系统上,然后U盘拷贝到内网,再上传到服务器,中间不知道哪个环节出问题了,检查文件的md5sum发现不对,重新下载上传就好了。可能是模型文件的问题,建议检查一下
不开load-in-8bit是正常的.
有可能是模型量化带来的问题,https://huggingface.co/databricks/dolly-v2-12b/discussions/77
我的解决方式是调整temperature
参数(chat()
或者generate()
接口)。原先temperature=0.1
会报错,调整到temperature=0.95
就不会有问题了。
推测应该是temperature
太低导致logits数值溢出。
My solution is to adjust the temperature
parameter (in the chat()
or generate()
interface). Initially, there was an error when temperature=0.1
, but adjusting it to temperature=0.95
resolved the issue.
It is speculated that the error was caused by the temperature
being too low, leading to overflow in the logits' values.
mark
mark
do_sample 做样
set 'per_device_eval_batch_size' a larger number将“per_device_eval_batch_size”设置为更大的数字
It works.
错误发生在这里: python3.10/site-packages/transformers/generation/utils.py
probs = nn.functional.softmax(next_token_scores, dim=-1)
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
因为probs
出现nan
,可能是softmax
的 next_token_scores
出现inf
的缘故,追朔至
next_token_scores = logits_processor(input_ids, next_token_logits)
next_token_scores = logits_warper(input_ids, next_token_scores)
可知next_token_scores
是logits_processor
处理的结果。在model.generate()
中有这样一个参数:logits_processor
,按下面的方式定义一个并传入即可:
from transformers.generation.logits_process import LogitsProcessor, LogitsProcessorList
from transformers.generation.logits_process import InfNanRemoveLogitsProcessor, MinLengthLogitsProcessor
logits_processor = LogitsProcessorList()
logits_processor.append(MinLengthLogitsProcessor(15, eos_token_id=tokenizer.eos_token))
logits_processor.append(InfNanRemoveLogitsProcessor())
其中MinLengthLogitsProcessor
是默认的logits_processor
,而新加的InfNanRemoveLogitsProcessor
可以把next_token_logits
中的nan
转化为0,把inf
转换为一个可以处理的最大数。
可以参考:
https://huggingface.co/transformers/v4.9.2/_modules/transformers/generation_logits_process.html