Senta 当运行预训练模型训练代码时，出现如下错误：The loss.shape should be (1L,), but the current loss.shape is (-1,)

WARNING: 08-13 14:03:09: io.py:712 * 139843275020096 paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead. /home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/clip.py:779: UserWarning: Caution! 'set_gradient_clip' is not recommended and may be deprecated in future! We recommend a new strategy: set 'grad_clip' when initializing the 'optimizer'. This method can reduce the mistakes, please refer to documention of 'optimizer'. warnings.warn("Caution! 'set_gradient_clip' is not recommended " Traceback (most recent call last): File "pretraining.py", line 359, in main(args) File "pretraining.py", line 351, in main trainer = trainer_class(params, readers, model) File "pretraining.py", line 152, in init BaseTrainer.init(self, params, data_set_reader, model_class) File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 48, in init self.init_net() File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 137, in init_net self.init_train_net() File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 161, in init_train_net **opt_args) File "/home/lbj/projects/Senta-master/senta/training/base_trainer.py", line 776, in optimization _, param_grads = optimizer.minimize(loss) File "", line 2, in minimize File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 277, in impl return func(*args, **kwargs) File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 835, in minimize no_grad_set=no_grad_set) File "/home/lbj/anaconda3/lib/python3.7/site-packages/paddle/fluid/optimizer.py", line 673, in backward loss.shape) AssertionError: The loss.shape should be (1L,), but the current loss.shape is (-1,). Maybe that you should call fluid.layers.mean to process the current loss.

环境：cuda-10.1 cudnn 7.5.0 nccl2.7.8 paddle-gpu 1.6.3.so107

Aug 13 '20 06:08 abnormall

sh ./script/run_pretrain_ernie_1.0_skep_large_ch.sh

Aug 13 '20 06:08 abnormall

运行 sh ./script/run_train.sh ./config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json，则是终端输出这条信息 INFO: 08-13 14:23:14: lanch.py:77 * 140039797012288 nranks: 1 后卡住

Aug 13 '20 06:08 abnormall

问题已经解决，是版本的问题。

Aug 17 '20 02:08 abnormall

问题已经解决，是版本的问题。

我也遇到同样的问题，能问下如何解决的么？

Oct 15 '20 03:10 igfuns

这个应该是要根据自己Ubuntu环境进行配置，但是我不知道是不是配置错了，目前还不知道怎么解决，请问您是如何解决的？

---原始邮件--- 发件人: "igfuns"<[email protected]> 发送时间: 2020年10月15日(周四) 中午11:49 收件人: "baidu/Senta"<[email protected]>; 抄送: "Subscribed"<[email protected]>; 主题: Re: [baidu/Senta] 当运行预训练模型训练代码时，出现如下错误：The loss.shape should be (1L,), but the current loss.shape is (-1,) (#38)

问题已经解决，是版本的问题。

我也遇到同样的问题，能问下如何解决的么？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Oct 15 '20 04:10 Edward-Joker

问题已经解决，是版本的问题。

我也遇到同样的问题，能问下如何解决的么？

我也是遇到这样的问题，我尝试从环境配置文件解决，但是提示说我的cmd终端

问题已经解决，是版本的问题。

我也遇到同样的问题，能问下如何解决的么？

请问是不是遇到和我这个问题一样的？ Traceback (most recent call last): File "./lanch.py", line 137, in main(lanch_args) File "./lanch.py", line 130, in main start_procs(args) File "./lanch.py", line 121, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.

我去看了lany.py文件，我有根据个人电脑情况配置了env.sh文件，但是还是报这个错误……

Oct 15 '20 04:10 Edward-Joker

我来回答一下，版本cuda 10.1, cudnn7.6.0, nccl2.7.8, paddle-gpu 1.6.3.post107, 在env.sh里填写好相应的路径位置，就应该可以按照readme跑通了。多gpu跑貌似还是会有bug，单gpu没问题

Dec 14 '20 12:12 abnormall

Traceback (most recent call last): File "./lanch.py", line 137, in main(lanch_args) File "./lanch.py", line 130, in main start_procs(args) File "./lanch.py", line 121, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/edward/anaconda3/envs/Paddle-Pytorch/bin/python', '-u', './train.py', '--param_path', './config/ernie_1.0_skep_large_ch.Chnsenticorp.cls.json', '--log_dir', './log']' died with <Signals.SIGABRT: 6>.

这个没有显示的表明报错是什么，可以尝试直接输入完整的命令

Dec 14 '20 12:12 abnormall

你好，我遇到类似的报错，能问下可以如何解决么？ 2020-12-15 09:27:32,855-INFO: proc 0 run failed INFO: 12-15 09:27:32: lanch.py:112 * 139696711771968 proc 0 run failed Traceback (most recent call last): File "./lanch.py", line 130, in main(lanch_args) File "./lanch.py", line 123, in main start_procs(args) File "./lanch.py", line 114, in start_procs cmd=cmds[i]) subprocess.CalledProcessError: Command '['/home/ant/.conda/envs/yt_37/bin/python', '-u', './infer.py', '--param_path', './ config/ernie_1.0_skep_large_ch.Chnsenticorp.infer.json', '--log_dir', './log']' returned non-zero exit status 1.

Dec 15 '20 01:12 xiaosheng123XIAO

2020-12-15 09：27：32,855-INFO：proc 0 run failed INFO：12-15 09:27:32：lanch.py：你好，我遇到类似的报错，能问下可以如何解决么？112 * 139696711771968 proc 0运行失败回溯（最近一次调用为最新）：文件“ ./lanch.py”，行130，位于 main（lanch_args）中文件“ ./lanch.py”，行123，位于主 start_procs（args）中在start_procs cmd = cmds [i]）子进程中的文件“ ./lanch.py”，第114行。CalledProcessError ：命令'['/home/ant/.conda/envs/yt_37/bin/python'，'-u' ，'。/ infer.py'，'-param_path'，'。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json'，'-log_dir'，'。/ log']'返回非零退出状态1。

我也遇到了同样的问题，请问你解决了吗

Dec 24 '20 08:12 Pengjm777

2020-12-15 09：27：32,855-INFO：proc 0 run failed INFO：12-15 09:27:32：lanch.py：你好，我遇到类似的报错，能问下可以如何解决么？112 * 139696711771968 proc 0运行失败回溯（最近一次调用为最新）：文件“ ./lanch.py”，行130，位于 main（lanch_args）中文件“ ./lanch.py”，行123，位于主 start_procs（args）中在start_procs cmd = cmds [i]）子进程中的文件“ ./lanch.py”，第114行。CalledProcessError ：命令'['/home/ant/.conda/envs/yt_37/bin/python'，'-u' ，'。/ infer.py'，'-param_path'，'。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json'，'-log_dir'，'。/ log']'返回非零退出状态1。

我也遇到了同样的问题，请问你解决了吗

我也遇到了这个问题，请问你解决了吗

Jan 24 '21 08:01 lhz211

2020-12-15 09：27：32,855-INFO：proc 0 run failed INFO：12-15 09:27:32：lanch.py：你好，我遇到类似的报错，能问下可以如何解决么？112 * 139696711771968 proc 0运行失败回溯（最近一次调用为最新）：文件“ ./lanch.py”，行130，位于 main（lanch_args）中文件“ ./lanch.py”，行123，位于主 start_procs（args）中在start_procs cmd = cmds [i]）子进程中的文件“ ./lanch.py”，第114行。CalledProcessError ：命令'['/home/ant/.conda/envs/yt_37/bin/python'，'-u' ，'。/ infer.py'，'-param_path'，'。/ config / ernie_1.0_skep_large_ch.Chnsenticorp.infer.json'，'-log_dir'，'。/ log']'返回非零退出状态1。

我也遇到了同样的问题，请问你解决了吗

我也遇到了这个问题，请问你解决了吗

同样的问题，请问有解决办法了吗？

Sep 06 '21 16:09 ppd118

Senta Senta copied to clipboard

当运行预训练模型训练代码时，出现如下错误：The loss.shape should be (1L,), but the current loss.shape is (-1,)

Senta
Senta copied to clipboard