iFLYTEK2021
iFLYTEK2021 copied to clipboard
loss变为nan
非常感谢作者提供这么好的项目,我准备好数据以后,训练突然变为nan是什么原因啊
2022-03-31 23:34:18,384 - mmdet - INFO - Epoch [1][250/3977] lr: 9.970e-03, eta: 12:19:21, time: 0.923, data_time: 0.010, memory: 4583, loss_rpn_cls: 0.0869, loss_rpn_bbox: 0.0403, s0.loss_cls: 0.2751, s0.acc: 88.4609, s0.loss_bbox: 0.1713, s0.loss_mask: 0.3097, s1.loss_cls: 0.1384, s1.acc: 87.4832, s1.loss_bbox: 0.1449, s1.loss_mask: 0.1456, s2.loss_cls: 0.0683, s2.acc: 88.0404, s2.loss_bbox: 0.0737, s2.loss_mask: 0.0646, loss: 1.5188 2022-03-31 23:34:56,077 - mmdet - INFO - Epoch [1][300/3977] lr: 1.197e-02, eta: 11:54:52, time: 0.754, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 92.0874, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 95.4942, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 94.7648, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:35:31,704 - mmdet - INFO - Epoch [1][350/3977] lr: 1.397e-02, eta: 11:32:28, time: 0.713, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:09,657 - mmdet - INFO - Epoch [1][400/3977] lr: 1.596e-02, eta: 11:20:03, time: 0.759, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:46,872 - mmdet - INFO - Epoch [1][450/3977] lr: 1.796e-02, eta: 11:08:58, time: 0.744, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:37:23,557 - mmdet - INFO - Epoch [1][500/3977] lr: 1.996e-02, eta: 10:59:34, time: 0.739, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:38:01,270 - mmdet - INFO - Epoch [1][550/3977] lr: 2.000e-02, eta: 10:52:32, time: 0.750, data_time: 0.007, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan
我也遇到这个问题了
我觉得是HTC这个模型的原因,当使用单GPU或很少的训练时,HTC有时会出现这个问题,如果是更多或者8块就不会
非常感谢作者提供这么好的项目,我准备好数据以后,训练突然变为nan是什么原因啊
2022-03-31 23:34:18,384 - mmdet - INFO - Epoch [1][250/3977] lr: 9.970e-03, eta: 12:19:21, time: 0.923, data_time: 0.010, memory: 4583, loss_rpn_cls: 0.0869, loss_rpn_bbox: 0.0403, s0.loss_cls: 0.2751, s0.acc: 88.4609, s0.loss_bbox: 0.1713, s0.loss_mask: 0.3097, s1.loss_cls: 0.1384, s1.acc: 87.4832, s1.loss_bbox: 0.1449, s1.loss_mask: 0.1456, s2.loss_cls: 0.0683, s2.acc: 88.0404, s2.loss_bbox: 0.0737, s2.loss_mask: 0.0646, loss: 1.5188 2022-03-31 23:34:56,077 - mmdet - INFO - Epoch [1][300/3977] lr: 1.197e-02, eta: 11:54:52, time: 0.754, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 92.0874, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 95.4942, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 94.7648, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:35:31,704 - mmdet - INFO - Epoch [1][350/3977] lr: 1.397e-02, eta: 11:32:28, time: 0.713, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:09,657 - mmdet - INFO - Epoch [1][400/3977] lr: 1.596e-02, eta: 11:20:03, time: 0.759, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:46,872 - mmdet - INFO - Epoch [1][450/3977] lr: 1.796e-02, eta: 11:08:58, time: 0.744, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:37:23,557 - mmdet - INFO - Epoch [1][500/3977] lr: 1.996e-02, eta: 10:59:34, time: 0.739, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:38:01,270 - mmdet - INFO - Epoch [1][550/3977] lr: 2.000e-02, eta: 10:52:32, time: 0.750, data_time: 0.007, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 你好,请问怎么开始训练复现呢
我的经验是调小学习率,希望能帮到你。
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年7月5日(星期三) 下午4:57 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [zhaozhen2333/iFLYTEK2021] loss变为nan (Issue #3)
非常感谢作者提供这么好的项目,我准备好数据以后,训练突然变为nan是什么原因啊
2022-03-31 23:34:18,384 - mmdet - INFO - Epoch [1][250/3977] lr: 9.970e-03, eta: 12:19:21, time: 0.923, data_time: 0.010, memory: 4583, loss_rpn_cls: 0.0869, loss_rpn_bbox: 0.0403, s0.loss_cls: 0.2751, s0.acc: 88.4609, s0.loss_bbox: 0.1713, s0.loss_mask: 0.3097, s1.loss_cls: 0.1384, s1.acc: 87.4832, s1.loss_bbox: 0.1449, s1.loss_mask: 0.1456, s2.loss_cls: 0.0683, s2.acc: 88.0404, s2.loss_bbox: 0.0737, s2.loss_mask: 0.0646, loss: 1.5188 2022-03-31 23:34:56,077 - mmdet - INFO - Epoch [1][300/3977] lr: 1.197e-02, eta: 11:54:52, time: 0.754, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 92.0874, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 95.4942, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 94.7648, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:35:31,704 - mmdet - INFO - Epoch [1][350/3977] lr: 1.397e-02, eta: 11:32:28, time: 0.713, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:09,657 - mmdet - INFO - Epoch [1][400/3977] lr: 1.596e-02, eta: 11:20:03, time: 0.759, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:46,872 - mmdet - INFO - Epoch [1][450/3977] lr: 1.796e-02, eta: 11:08:58, time: 0.744, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:37:23,557 - mmdet - INFO - Epoch [1][500/3977] lr: 1.996e-02, eta: 10:59:34, time: 0.739, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:38:01,270 - mmdet - INFO - Epoch [1][550/3977] lr: 2.000e-02, eta: 10:52:32, time: 0.750, data_time: 0.007, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 你好,请问怎么开始训练复现呢
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
我也遇到这个问题了 兄弟,可以交流一下吗?刚才看到你在map杯上提问了,我的邮箱[email protected],没法联系到你。。。
我的经验是调小学习率,希望能帮到你。 … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年7月5日(星期三) 下午4:57 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [zhaozhen2333/iFLYTEK2021] loss变为nan (Issue #3) 非常感谢作者提供这么好的项目,我准备好数据以后,训练突然变为nan是什么原因啊 2022-03-31 23:34:18,384 - mmdet - INFO - Epoch [1][250/3977] lr: 9.970e-03, eta: 12:19:21, time: 0.923, data_time: 0.010, memory: 4583, loss_rpn_cls: 0.0869, loss_rpn_bbox: 0.0403, s0.loss_cls: 0.2751, s0.acc: 88.4609, s0.loss_bbox: 0.1713, s0.loss_mask: 0.3097, s1.loss_cls: 0.1384, s1.acc: 87.4832, s1.loss_bbox: 0.1449, s1.loss_mask: 0.1456, s2.loss_cls: 0.0683, s2.acc: 88.0404, s2.loss_bbox: 0.0737, s2.loss_mask: 0.0646, loss: 1.5188 2022-03-31 23:34:56,077 - mmdet - INFO - Epoch [1][300/3977] lr: 1.197e-02, eta: 11:54:52, time: 0.754, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 92.0874, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 95.4942, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 94.7648, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:35:31,704 - mmdet - INFO - Epoch [1][350/3977] lr: 1.397e-02, eta: 11:32:28, time: 0.713, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:09,657 - mmdet - INFO - Epoch [1][400/3977] lr: 1.596e-02, eta: 11:20:03, time: 0.759, data_time: 0.010, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:36:46,872 - mmdet - INFO - Epoch [1][450/3977] lr: 1.796e-02, eta: 11:08:58, time: 0.744, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:37:23,557 - mmdet - INFO - Epoch [1][500/3977] lr: 1.996e-02, eta: 10:59:34, time: 0.739, data_time: 0.011, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-03-31 23:38:01,270 - mmdet - INFO - Epoch [1][550/3977] lr: 2.000e-02, eta: 10:52:32, time: 0.750, data_time: 0.007, memory: 4583, loss_rpn_cls: nan, loss_rpn_bbox: nan, s0.loss_cls: nan, s0.acc: 100.0000, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 100.0000, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 100.0000, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 你好,请问怎么开始训练复现呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
兄弟,可以交流一下吗?刚才看到你在map杯上提问了,我的邮箱[email protected],没法联系到你。。。
请问这个问题有解决办法吗,我这里也是训练时loss突然变成了nan