FlagAI icon indicating copy to clipboard operation
FlagAI copied to clipboard

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Open candowu opened this issue 2 years ago • 6 comments

System Info

flagai 1.7.1 pypi_0 pypi centos 7

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as T5/AltCLIP, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

python generate_code.py

Expected behavior

[2023-06-13 09:49:39,944] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain building model... ******************** lm aquilacode-7b-nv model checkpoint_path=../checkpoints_in/aquilacode-7b-nv/pytorch_model.bin are loaded successfully... All special tokens: [('pad', '<|endoftext|>', 0), ('eos', '<|endoftext|>', 0), ('sop', '<|startofpiece|>', 100000), ('eop', '<|endofpiece|>', 100001), ('cls', '[CLS]', 100006), ('MASK', '[MASK]', 100003), ('sep', '', 100007), ('unk', '[UNK]', 0), ('gMASK', '[gMASK]', 100004), ('sMASK', '[sMASK]', 100005)] /miniconda3/envs/flagai/lib/python3.9/site-packages/flagai/model/predictor/aquila.py:32: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). tokens[k, : len(t)] = torch.tensor(t).long() Traceback (most recent call last): File "/data1/candowu/FlagAI-master/examples/Aquila/Aquila-code/generate_code.py", line 45, in res = predictor.predict_generate_randomsample(text, File "/miniconda3/envs/flagai/lib/python3.9/site-packages/flagai/model/predictor/predictor.py", line 352, in predict_generate_randomsample return aquila_generate(self.tokenizer, self.model, File "/miniconda3/envs/flagai/lib/python3.9/site-packages/flagai/model/predictor/aquila.py", line 41, in aquila_generate next_token = sample_top_p(probs, top_p) File "/miniconda3/envs/flagai/lib/python3.9/site-packages/flagai/model/predictor/aquila.py", line 81, in sample_top_p next_token = torch.multinomial(probs_sort, num_samples=1) RuntimeError: probability tensor contains either inf, nan or element < 0

candowu avatar Jun 13 '23 01:06 candowu

https://github.com/FlagAI-Open/FlagAI/pull/379

可以试试这样是否可以修复

ftgreat avatar Jun 13 '23 03:06 ftgreat

The same problem.

This pr (https://github.com/FlagAI-Open/FlagAI/pull/379) seems not work for me😭

HermitSun avatar Jun 13 '23 08:06 HermitSun

same problem, sporadic outbreaks.

next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

yongzhuo avatar Jun 13 '23 08:06 yongzhuo

flagai 1.7.2, is released, this issue should is largely fixed, though it still occasionally happened

BAAI-OpenPlatform avatar Jun 13 '23 09:06 BAAI-OpenPlatform

So how could I predict whether this problem will happen?

Or in other words, what kind of input is more likely to cause this problem?

HermitSun avatar Jun 13 '23 10:06 HermitSun

这个可以过一个softmax解决么

safehumeng avatar Jun 13 '23 10:06 safehumeng

1.7.2版本测试了 还是有这个问题

Zhang-star-master avatar Jun 14 '23 01:06 Zhang-star-master

已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢

ftgreat avatar Jun 14 '23 07:06 ftgreat

已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢

1.7.3 还是出现一样的问题了 File "/opt/conda/lib/python3.8/site-packages/flagai/model/predictor/gpt.py", line 53, in gpt_random_sample_use_cache

Zhang-l-i-n avatar Jul 25 '23 06:07 Zhang-l-i-n

已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢

1.7.3 还是出现一样的问题了 File "/opt/conda/lib/python3.8/site-packages/flagai/model/predictor/gpt.py", line 53, in gpt_random_sample_use_cache

问题应该在logit_score[:, tokenizer.get_command_id('unk')] = -float('Inf')

Zhang-l-i-n avatar Jul 25 '23 07:07 Zhang-l-i-n