tf-faster-rcnn icon indicating copy to clipboard operation
tf-faster-rcnn copied to clipboard

loss is nan

Open henbucuoshanghai opened this issue 6 years ago • 7 comments
trafficstars

iter: 20 / 70000, total loss: nan

rpn_loss_cls: 0.648382 rpn_loss_box: 0.045217 loss_cls: 2.149889 loss_box: 0.199051 lr: 0.010000 speed: 0.794s / iter iter: 40 / 70000, total loss: nan rpn_loss_cls: 0.574709 rpn_loss_box: 0.317064 loss_cls: 0.000009 loss_box: 0.000000 lr: 0.010000 speed: 0.524s / iter iter: 60 / 70000, total loss: nan rpn_loss_cls: 0.480793 rpn_loss_box: 0.024617 loss_cls: 0.667423 loss_box: 0.492639 lr: 0.010000 speed: 0.436s / iter iter: 80 / 70000, total loss: nan rpn_loss_cls: 0.406555 rpn_loss_box: 0.017055 loss_cls: 0.085850 loss_box: 0.044566 lr: 0.010000

henbucuoshanghai avatar Jun 13 '19 01:06 henbucuoshanghai

and the next time,try it with small lr

iter: 20 / 70000, total loss: 1.053862

rpn_loss_cls: 0.369174 rpn_loss_box: 0.111182 loss_cls: 0.190974 loss_box: 0.000000 lr: 0.000100 speed: 0.822s / iter iter: 40 / 70000, total loss: 0.851714 rpn_loss_cls: 0.277755 rpn_loss_box: 0.089165 loss_cls: 0.102262 loss_box: 0.000000 lr: 0.000100 speed: 0.554s / iter iter: 60 / 70000, total loss: 0.883730 rpn_loss_cls: 0.278216 rpn_loss_box: 0.113697 loss_cls: 0.109286 loss_box: 0.000000 lr: 0.000100 speed: 0.459s / iter iter: 80 / 70000, total loss: 0.671090 rpn_loss_cls: 0.118846 rpn_loss_box: 0.009400 loss_cls: 0.160315 loss_box: 0.000000 lr: 0.000100 speed: 0.410s / iter iter: 100 / 70000, total loss: 0.936539 rpn_loss_cls: 0.082754 rpn_loss_box: 0.006793 loss_cls: 0.320606 loss_box: 0.143858 lr: 0.000100

henbucuoshanghai avatar Jun 13 '19 01:06 henbucuoshanghai

loss_box: 0.000000 why??????????????

henbucuoshanghai avatar Jun 13 '19 01:06 henbucuoshanghai

big brother, i met the same problem. How you solve it? Just adjust the learning rate ? My loss_box and loss_cls both are 0.00000....

jamessmith123456 avatar Oct 21 '19 11:10 jamessmith123456

I have the same problem and have no idea how to solve it?

zxz-cc avatar Dec 13 '19 07:12 zxz-cc

啊,这个是因为数据集制作错误了;图片的那个ground-truth box坐标,有的超出了图片边界,比如为-1;或者是右下角的点的横坐标小于等于左上角的横坐标(按道理应该是大于才对);我记得有个地方是可以改源码的,就可以避免这个错误,你查一下;

Emmm,dataset is wrong. Especially the cordiniate of ground-truth box. For example: x2 must larger than x1.(something like that...).  if x2==x1, then width = (x2-x1)=0;  a/(x2-x1)=NaN!!!

------------------ 原始邮件 ------------------ 发件人: "CRmost"<[email protected]>; 发送时间: 2019年12月13日(星期五) 下午3:27 收件人: "endernewton/tf-faster-rcnn"<[email protected]>; 抄送: "庄驰"<[email protected]>; "Comment"<[email protected]>; 主题: Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

I have the same problem and have no idea how to solve it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jamessmith123456 avatar Dec 13 '19 08:12 jamessmith123456

谢谢你的答复!但我还遇到另一个问题,就是loss_cl和loss_box一直到训练结束都为0 ?这是为什么?是我的迭代次数不够吗??我只有990张训练数据,一共迭代了13000次。

------------------ 原始邮件 ------------------ 发件人: "jamessmith123456"<[email protected]>; 发送时间: 2019年12月13日(星期五) 下午4:00 收件人: "endernewton/tf-faster-rcnn"<[email protected]>; 抄送: "CR"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

啊,这个是因为数据集制作错误了;图片的那个ground-truth box坐标,有的超出了图片边界,比如为-1;或者是右下角的点的横坐标小于等于左上角的横坐标(按道理应该是大于才对);我记得有个地方是可以改源码的,就可以避免这个错误,你查一下;

Emmm,dataset is wrong. Especially the cordiniate of ground-truth box. For example: x2 must larger than x1.(something like that...).&nbsp; if x2==x1, then width = (x2-x1)=0;&nbsp; a/(x2-x1)=NaN!!!

------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "CRmost"<[email protected]&gt;;
发送时间: 2019年12月13日(星期五) 下午3:27 收件人: "endernewton/tf-faster-rcnn"<[email protected]&gt;;
抄送: "庄驰"<[email protected]&gt;; "Comment"<[email protected]&gt;;
主题: Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

I have the same problem and have no idea how to solve it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

zxz-cc avatar Dec 13 '19 08:12 zxz-cc

感觉990张确实有点少...

------------------ 原始邮件 ------------------ 发件人: "CRmost"<[email protected]>; 发送时间: 2019年12月13日(星期五) 下午4:04 收件人: "endernewton/tf-faster-rcnn"<[email protected]>; 抄送: "庄驰"<[email protected]>; "Comment"<[email protected]>; 主题: Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

谢谢你的答复!但我还遇到另一个问题,就是loss_cl和loss_box一直到训练结束都为0 ?这是为什么?是我的迭代次数不够吗??我只有990张训练数据,一共迭代了13000次。

------------------&nbsp;原始邮件&nbsp;------------------ 发件人:&nbsp;"jamessmith123456"<[email protected]&gt;; 发送时间:&nbsp;2019年12月13日(星期五) 下午4:00 收件人:&nbsp;"endernewton/tf-faster-rcnn"<[email protected]&gt;; 抄送:&nbsp;"CR"<[email protected]&gt;;"Comment"<[email protected]&gt;; 主题:&nbsp;Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

啊,这个是因为数据集制作错误了;图片的那个ground-truth box坐标,有的超出了图片边界,比如为-1;或者是右下角的点的横坐标小于等于左上角的横坐标(按道理应该是大于才对);我记得有个地方是可以改源码的,就可以避免这个错误,你查一下;

Emmm,dataset is wrong. Especially the cordiniate of ground-truth box. For example: x2 must larger than x1.(something like that...).&amp;nbsp; if x2==x1, then width = (x2-x1)=0;&amp;nbsp;
a/(x2-x1)=NaN!!!

------------------&amp;nbsp;原始邮件&amp;nbsp;------------------
发件人: "CRmost"<[email protected]&amp;gt;;
发送时间: 2019年12月13日(星期五) 下午3:27
收件人: "endernewton/tf-faster-rcnn"<[email protected]&amp;gt;;
抄送: "庄驰"<[email protected]&amp;gt;; "Comment"<[email protected]&amp;gt;;
主题: Re: [endernewton/tf-faster-rcnn] loss is nan (#458)

I have the same problem and have no idea how to solve it?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jamessmith123456 avatar Dec 13 '19 08:12 jamessmith123456