Chinese-LLaMA-Alpaca icon indicating copy to clipboard operation
Chinese-LLaMA-Alpaca copied to clipboard

num_beam=2, RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Open jzsbioinfo opened this issue 1 year ago • 5 comments

  • 模型推理问题(🤗 transformers)

以float16模式加载模型到V100显卡上,运行示例没有问题。但是将num_beam=1改成num_beam=2之后会出现RuntimeError: probability tensor contains either inf, nan or element < 0。

将 https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/inference_hf.py 中的num_beams=1改成num_beam=2可以复现问题。

jzsbioinfo avatar May 05 '23 03:05 jzsbioinfo

可能是transformers的问题,也可能是模型权重的问题,目前尚不清楚,正在排查,可以参考https://github.com/oobabooga/text-generation-webui/issues/199

airaria avatar May 05 '23 10:05 airaria

same problem. My generation config: do_sample=True, beams>1(using beam_sample) It turns to be normal when tweak beams=1(using sample). Didn't try if tweak do_sample to False, but it seems working in others' issues.

Note: this error appears both in merged weights or after personal finetuned.

qiguanqiang avatar May 05 '23 13:05 qiguanqiang

@ZenBuilds @jzsbioinfo

Did you encounter the same problem when inferring with the original LLaMA with beam size set to 2? With command like follows

python inference_hf.py --base_model path_to_llama_7b_hf --interactive

airaria avatar May 05 '23 14:05 airaria

@ZenBuilds @jzsbioinfo

Did you encounter the same problem when inferring with the original LLaMA with beam size set to 2? With command like follows

python inference_hf.py --base_model path_to_llama_7b_hf --interactive

Have tried your command using llama-7b-hf, and set beams=2, it happens again.

parameters_of_generation:
generation_config = dict(
    temperature=0.2,
    top_k=40,
    top_p=0.9,
    do_sample=True,
    **num_beams=2,**
    repetition_penalty=1.3,
    max_new_tokens=400
    )

The error looks like:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/tiger/.local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 120
CUDA SETUP: Loading binary /home/tiger/.local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda120.so...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 33/33 [00:47<00:00,  1.43s/it]
Vocab of the base model: 32000
Vocab of the tokenizer: 32000
Input:hi
/home/tiger/.local/lib/python3.9/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Traceback (most recent call last):
  File "/opt/tiger/startbash/Chinese-LLaMA-Alpaca/scripts/inference_hf.py", line 104, in <module>
    generation_output = model.generate(
  File "/home/tiger/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/tiger/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 1562, in generate
    return self.beam_sample(
  File "/home/tiger/.local/lib/python3.9/site-packages/transformers/generation/utils.py", line 3187, in beam_sample
    next_tokens = torch.multinomial(probs, num_samples=2 * num_beams)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

qiguanqiang avatar May 05 '23 14:05 qiguanqiang

我重新在A100进行了测试

do_sample=False, num_beams=2, 没有问题 do_sample=True, num_beams=1, 没有问题 do_sample=True, num_beams=2, 有问题

根据 https://huggingface.co/docs/transformers/main_classes/text_generation 就是说进行 multinomial sampling 就会报错。目前不知道怎么解决

jzsbioinfo avatar May 06 '23 03:05 jzsbioinfo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] avatar May 13 '23 22:05 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

github-actions[bot] avatar May 17 '23 22:05 github-actions[bot]