tensorflow-yolov3
tensorflow-yolov3 copied to clipboard
loss non on test set
I am testing on VOC2007 dataset. The training went ok and the training loss is keep dropping in a good way (after 3 epoch it was ~30). However, every time after a epoch finished, the test loss is always NAN. Anybody face the similar problem? PS: I am training from the scratch.
Never mind. I got it fixed. The problem is mine, and has nothing to do with the code...
Never mind. I got it fixed. The problem is mine, and has nothing to do with the code...
I have same problem with you. So, how to solve it? reduce learning rate?
same issue? anyone?
I changed my __C.TRAIN.BATCH_SIZE to 3, which caused me to get loss=nan issue. I changed it to 2, which fixed the issue. Originally I had it at 6, but ran into OOM exception. Running on crappy AM8 2 core cpu, with only 8 gigs of ram, and a new RTX 2080, 8 gig.
I changed my __C.TRAIN.BATCH_SIZE to 3, which caused me to get loss=nan issue. I changed it to 2, which fixed the issue. Originally I had it at 6, but ran into OOM exception. Running on crappy AM8 2 core cpu, with only 8 gigs of ram, and a new RTX 2080, 8 gig.
So, does that mean we can't set the batch_size to large? or maybe depend on our GPU memory?
I'm not sure. Probably the batch_size was too large possibly in combination with a larger dataset with 8000 images. How did you resolve your "nan" issue?
Did you wait for few epochs? I had nan on a test set for first few epochs (training on the XISRay dataset with default settings) and then it went back to normal.
ps. I didn't have the GPU memory issue.
Did you wait for few epochs? I had nan on a test set for first few epochs (training on the XISRay dataset with default settings) and then it went back to normal.
ps. I didn't have the GPU memory issue.
@FangliangBai After how many iterations it become normal?