FlagAI
FlagAI copied to clipboard
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
System Info
flagai 1.7.1 pypi_0 pypi centos 7
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as T5/AltCLIP, ...) - [ ] My own task or dataset (give details below)
Reproduction
python generate_code.py
Expected behavior
[2023-06-13 09:49:39,944] [INFO] [logger.py:85:log_dist] [Rank -1] Unsupported bmtrain
building model...
******************** lm aquilacode-7b-nv
model checkpoint_path=../checkpoints_in/aquilacode-7b-nv/pytorch_model.bin are loaded successfully...
All special tokens: [('pad', '<|endoftext|>', 0), ('eos', '<|endoftext|>', 0), ('sop', '<|startofpiece|>', 100000), ('eop', '<|endofpiece|>', 100001), ('cls', '[CLS]', 100006), ('MASK', '[MASK]', 100003), ('sep', '', 100007), ('unk', '[UNK]', 0), ('gMASK', '[gMASK]', 100004), ('sMASK', '[sMASK]', 100005)]
/miniconda3/envs/flagai/lib/python3.9/site-packages/flagai/model/predictor/aquila.py:32: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
tokens[k, : len(t)] = torch.tensor(t).long()
Traceback (most recent call last):
File "/data1/candowu/FlagAI-master/examples/Aquila/Aquila-code/generate_code.py", line 45, in inf, nan or element < 0
https://github.com/FlagAI-Open/FlagAI/pull/379
可以试试这样是否可以修复
The same problem.
This pr (https://github.com/FlagAI-Open/FlagAI/pull/379) seems not work for me😭
same problem, sporadic outbreaks.
next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
flagai 1.7.2, is released, this issue should is largely fixed, though it still occasionally happened
So how could I predict whether this problem will happen?
Or in other words, what kind of input is more likely to cause this problem?
这个可以过一个softmax解决么
1.7.2版本测试了 还是有这个问题
已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢
已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢
1.7.3 还是出现一样的问题了 File "/opt/conda/lib/python3.8/site-packages/flagai/model/predictor/gpt.py", line 53, in gpt_random_sample_use_cache
已发布1.7.3,大家可以更新下。如果问题,及时反馈,谢谢
1.7.3 还是出现一样的问题了 File "/opt/conda/lib/python3.8/site-packages/flagai/model/predictor/gpt.py", line 53, in gpt_random_sample_use_cache
问题应该在logit_score[:, tokenizer.get_command_id('unk')] = -float('Inf')