Senta icon indicating copy to clipboard operation
Senta copied to clipboard

当运行预训练模型训练代码时,出现如下错误:The loss.shape should be (1L,), but the current loss.shape is (-1,)

Open abnormall opened this issue 4 years ago • 12 comments

WARNING: 08-13 14:03:09: io.py:712 * 139843275020096 paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead. /home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/clip.py:779: UserWarning: Caution! 'set_gradient_clip' is not recommended and may be deprecated in future! We recommend a new strategy: set 'grad_clip' when initializing the 'optimizer'. This method can reduce the mistakes, please refer to documention of 'optimizer'. warnings.warn("Caution! 'set_gradient_clip' is not recommended " Traceback (most recent call last): File "pretraining.py", line 359, in main(args) File "pretraining.py", line 351, in main trainer = trainer_class(params, readers, model) File "pretraining.py", line 152, in init BaseTrainer.init(self, params, data_set_reader, model_class) File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 48, in init self.init_net() File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 137, in init_net self.init_train_net() File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 161, in init_train_net **opt_args) File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 776, in optimization _, param_grads = optimizer.minimize(loss) File "", line 2, in minimize File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 277, in impl return func(*args, **kwargs) File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 835, in minimize no_grad_set=no_grad_set) File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 673, in backward loss.shape) AssertionError: The loss.shape should be (1L,), but the current loss.shape is (-1,). Maybe that you should call fluid.layers.mean to process the current loss.

环境:cuda-10.1 cudnn 7.5.0 nccl2.7.8 paddle-gpu 1.6.3.so107

abnormall avatar Aug 13 '20 06:08 abnormall

sh ./script/run_pretrain_ernie_1.0_skep_large_ch.sh

abnormall avatar Aug 13 '20 06:08 abnormall

运行 sh ./script/run_train.sh ./config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json, 则是 终端输出 这条信息 INFO: 08-13 14:23:14: lanch.py:77 * 140039797012288 nranks: 1 后卡住

abnormall avatar Aug 13 '20 06:08 abnormall

问题已经解决,是版本的问题。

abnormall avatar Aug 17 '20 02:08 abnormall

问题已经解决,是版本的问题。

我也遇到同样的问题,能问下如何解决的么?

igfuns avatar Oct 15 '20 03:10 igfuns

这个应该是要根据自己Ubuntu环境进行配置,但是我不知道是不是配置错了,目前还不知道怎么解决,请问您是如何解决的?

---原始邮件--- 发件人: "igfuns"<[email protected]> 发送时间: 2020年10月15日(周四) 中午11:49 收件人: "baidu/Senta"<[email protected]>; 抄送: "Subscribed"<[email protected]>; 主题: Re: [baidu/Senta] 当运行预训练模型训练代码时,出现如下错误:The loss.shape should be (1L,), but the current loss.shape is (-1,) (#38)

问题已经解决,是版本的问题。

我也遇到同样的问题,能问下如何解决的么?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Edward-Joker avatar Oct 15 '20 04:10 Edward-Joker

问题已经解决,是版本的问题。

我也遇到同样的问题,能问下如何解决的么?

我也是遇到这样的问题,我尝试从环境配置文件解决,但是提示说我的cmd终端

问题已经解决,是版本的问题。

我也遇到同样的问题,能问下如何解决的么?

请问是不是遇到和我这个问题一样的? Traceback (most recent call last): File "./lanch.py", line 137, in main(lanch_args) File "./lanch.py", line 130, in main start_procs(args) File "./lanch.py", line 121, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.

我去看了lany.py文件,我有根据个人电脑情况配置了env.sh文件,但是还是报这个错误……

Edward-Joker avatar Oct 15 '20 04:10 Edward-Joker

我来回答一下,版本cuda 10.1, cudnn7.6.0, nccl2.7.8, paddle-gpu 1.6.3.post107, 在env.sh里填写好相应的路径位置,就应该可以按照readme跑通了。 多gpu跑貌似还是会有bug, 单gpu没问题

abnormall avatar Dec 14 '20 12:12 abnormall

Traceback (most recent call last): File "./lanch.py", line 137, in main(lanch_args) File "./lanch.py", line 130, in main start_procs(args) File "./lanch.py", line 121, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.

这个没有显示的表明报错是什么,可以尝试直接输入完整的命令

abnormall avatar Dec 14 '20 12:12 abnormall

你好,我遇到类似的报错,能问下可以如何解决么? 2020-12-15 09:27:32,855-INFO: proc 0 run failed INFO: 12-15 09:27:32: lanch.py:112 * 139696711771968 proc 0 run failed Traceback (most recent call last): File "./lanch.py", line 130, in main(lanch_args) File "./lanch.py", line 123, in main start_procs(args) File "./lanch.py", line 114, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/ant/.conda/envs/yt_37/bin/python', '-u', './infer.py', '--param_path', './ config/ernie_1.0_skep_large_ch.Chnsenticorp.infer.json', '--log_dir', './log']' returned non-zero exit status 1.

xiaosheng123XIAO avatar Dec 15 '20 01:12 xiaosheng123XIAO

2020-12-15 09:27:32,855-INFO:proc 0 run failed INFO:12-15 09:27:32:lanch.py​​:你好,我遇到类似的报错,能问下可以如何解决么?112 * 139696711771968 proc 0运行失败 回溯(最近一次调用为最新): 文件“ ./lanch.py​​”,行130,位于 main(lanch_args)中 文件“ ./lanch.py​​”,行123,位于主 start_procs(args)中 在start_procs cmd = cmds [i]) 子进程中的文件“ ./lanch.py​​”,第114行。CalledProcessError :命令'['/home/ant/.conda/envs/yt_37/bin/python','-u' ,'。/ infer.py','-param_path','。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json','-log_dir','。/ log']'返回非零退出状态1。

我也遇到了同样的问题,请问你解决了吗

Pengjm777 avatar Dec 24 '20 08:12 Pengjm777

2020-12-15 09:27:32,855-INFO:proc 0 run failed INFO:12-15 09:27:32:lanch.py​​:你好,我遇到类似的报错,能问下可以如何解决么?112 * 139696711771968 proc 0运行失败 回溯(最近一次调用为最新): 文件“ ./lanch.py​​”,行130,位于 main(lanch_args)中 文件“ ./lanch.py​​”,行123,位于主 start_procs(args)中 在start_procs cmd = cmds [i]) 子进程中的文件“ ./lanch.py​​”,第114行。CalledProcessError :命令'['/home/ant/.conda/envs/yt_37/bin/python','-u' ,'。/ infer.py','-param_path','。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json','-log_dir','。/ log']'返回非零退出状态1。

我也遇到了同样的问题,请问你解决了吗

我也遇到了这个问题,请问你解决了吗

lhz211 avatar Jan 24 '21 08:01 lhz211

2020-12-15 09:27:32,855-INFO:proc 0 run failed INFO:12-15 09:27:32:lanch.py​​:你好,我遇到类似的报错,能问下可以如何解决么?112 * 139696711771968 proc 0运行失败 回溯(最近一次调用为最新): 文件“ ./lanch.py​​”,行130,位于 main(lanch_args)中 文件“ ./lanch.py​​”,行123,位于主 start_procs(args)中 在start_procs cmd = cmds [i]) 子进程中的文件“ ./lanch.py​​”,第114行。CalledProcessError :命令'['/home/ant/.conda/envs/yt_37/bin/python','-u' ,'。/ infer.py','-param_path','。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json','-log_dir','。/ log']'返回非零退出状态1。

我也遇到了同样的问题,请问你解决了吗

我也遇到了这个问题,请问你解决了吗

同样的问题,请问有解决办法了吗?

ppd118 avatar Sep 06 '21 16:09 ppd118