tensorflow-yolov3 loss non on test set

loss non on test set

Open justanotherYO opened this issue 5 years ago • 8 comments

I am testing on VOC2007 dataset. The training went ok and the training loss is keep dropping in a good way (after 3 epoch it was ~30). However, every time after a epoch finished, the test loss is always NAN. Anybody face the similar problem? PS: I am training from the scratch.

May 26 '19 10:05 justanotherYO

Never mind. I got it fixed. The problem is mine, and has nothing to do with the code...

May 26 '19 23:05 justanotherYO

Never mind. I got it fixed. The problem is mine, and has nothing to do with the code...

I have same problem with you. So, how to solve it? reduce learning rate?

May 29 '19 08:05 ZH-Lee

same issue? anyone?

May 29 '19 17:05 Sahaj09

I changed my __C.TRAIN.BATCH_SIZE to 3, which caused me to get loss=nan issue. I changed it to 2, which fixed the issue. Originally I had it at 6, but ran into OOM exception. Running on crappy AM8 2 core cpu, with only 8 gigs of ram, and a new RTX 2080, 8 gig.

May 30 '19 07:05 andydion

I changed my __C.TRAIN.BATCH_SIZE to 3, which caused me to get loss=nan issue. I changed it to 2, which fixed the issue. Originally I had it at 6, but ran into OOM exception. Running on crappy AM8 2 core cpu, with only 8 gigs of ram, and a new RTX 2080, 8 gig.

So, does that mean we can't set the batch_size to large? or maybe depend on our GPU memory?

May 30 '19 08:05 ZH-Lee

I'm not sure. Probably the batch_size was too large possibly in combination with a larger dataset with 8000 images. How did you resolve your "nan" issue?

May 30 '19 13:05 andydion

Did you wait for few epochs? I had nan on a test set for first few epochs (training on the XISRay dataset with default settings) and then it went back to normal.

ps. I didn't have the GPU memory issue.

May 30 '19 16:05 FangliangBai

Did you wait for few epochs? I had nan on a test set for first few epochs (training on the XISRay dataset with default settings) and then it went back to normal.

ps. I didn't have the GPU memory issue.

@FangliangBai After how many iterations it become normal?

Aug 07 '20 08:08 MuhammadAsadJaved

tensorflow-yolov3 tensorflow-yolov3 copied to clipboard

loss non on test set

tensorflow-yolov3
tensorflow-yolov3 copied to clipboard