Yolo_mark Getting Nan Value in class and object while training

Getting Nan Value in class and object while training

Open kpreeti096 opened this issue 6 years ago • 10 comments

when i'm trying to train my dataset on yolo it's giving me some nan value. I have 60 images of dataset. and I want to train my network so that I could get weights for my model.

45: 14.083967, 18.795122 avg, 0.000000 rate, 177.142862 seconds, 360 images
Loaded: 0.000055 seconds
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.500224, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498702, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.501422, Avg Recall: -nan, count: 0
Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.501599, Avg Recall: -nan, count: 0

I am also getting the same issue.

1.I have used https://github.com/AlexeyAB/darknet
2. yolo-obj.cfg
3. bbox label tool
4.I don't have values of x,y,w,h <= 0 or >=1 in my leabel txt-files

Suggest me if anyone have the idea to deal with this issue

Jun 06 '18 14:06 kpreeti096

Do you use latest code from https://github.com/AlexeyAB/darknet ?
Can you show screenshot, what do you see when you open your dataset in the Yolo_mark?

Jun 06 '18 15:06 AlexeyAB

Hi, Yes, I am using the latest code from AlexeyAB/darknet selection_014 selection_013

./darknet detector train build/darknet/x64/data/obj.data build/darknet/x64/data/yolo-obj.cfg build/darknet/x64/data/yolo3.weights yolo-obj layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 0.006 BF 2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 0.003 BF 4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 0.001 BF 8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 0.001 BF 12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 0.000 BF 18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF 24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF 25 route 16 26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BF 27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256 28 route 27 24 29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BF 30 conv 35 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 35 0.012 BF 31 detection mask_scale: Using default '1.000000' Total BFLOPS 29.340 Loading weights from build/darknet/x64/data/yolo3.weights... seen 64 Done! Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 Resizing 352 x 352 Loaded: 0.285117 seconds Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.527893, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.530538, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.525535, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.529839, Avg Recall: -nan, count: 0

in Yolo Mark,I need to first load the 1, to open folder 1, then I can open other folde by just putting the folder name let say for 004, as 4 and Load.After being loaded, in txt file I get the number of the boxes made and x,y,w,h value.

After conversion, my txt file output is coming like this.

0 0.2703296703296703 0.4149214659685864 0.421978021978022 0.08115183246073299 0 0.7208791208791209 0.4149214659685864 0.421978021978022 0.08638743455497383 0 0.4989010989010989 0.5575916230366492 0.8747252747252747 0.08900523560209425 0 0.5021978021978022 0.6989528795811518 0.8769230769230769 0.07853403141361257 1 0.4989010989010989 0.8704188481675393 0.8747252747252747 0.13350785340314136 1 0.2824175824175824 0.12041884816753927 0.43736263736263736 0.09424083769633508 1 0.7241758241758242 0.11910994764397906 0.432967032967033 0.10209424083769635

0 and 1 represents class and corresponding Yolo format conversion.

Jun 06 '18 18:06 kpreeti096

What avg-loss can you get at 2000 iteration?

Jun 06 '18 18:06 AlexeyAB

It is still running on CPU.After 6 iterations, it is coming like this 6: 9.334825, 9.320776 avg,

6: 9.334825, 9.320776 avg, 0.000000 rate, 943.138674 seconds, 384 images Loaded: 0.000048 seconds Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.528263, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.530658, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.528865, Avg Recall: -nan, count: 0 Region Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.526204, Avg Recall: -nan, count: 0

Jun 07 '18 06:06 kpreeti096

I am facing similar issue 'nan' is coming even before 62

62: nan, nan avg, 0.000000 rate, 1225.356435 seconds, 3968 images Loaded: 0.000000 seconds Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 7 Region 94 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R:0.000000, count: 2 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 10 Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: nan, .5R: -nan(ind), .75R: -nan(ind), count: 0 Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0

I am using yolov3 config for face detection dataset. with pretrained weights darknet53.conv.74

Jun 08 '18 05:06 pallpb

Hi, I am still facing the same issue, even after 23 iterations. I have used around 60 image to train my model.

Jun 08 '18 07:06 kpreeti096

@kpreeti096 please go with GPU. If you want to work with images

Jun 22 '18 03:06 abdulkalam1233

@kpreeti096 Hi I also faced the same issue as you I used the following: https://github.com/AlexeyAB/darknet ( I am using Windows 10 and compiled it following the instructions given in the tutorial with MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.3.0) and after following the instructions given in the above github document and this one - https://github.com/AlexeyAB/darknet/issues/1455#issuecomment-416030423 (i added the files given in the properties of darknet.sln) https://github.com/AlexeyAB/Yolo_mark ( I am using Yolo_mark to annotate my own dataset ) and I had the same problem as you.

I am now able to solve it. What I did was that I checked the directory in which darknet.exe is , and in the folder D:/data/img I copied the annotated dataset. Earlier it did not contain the txt files so it was giving this error. After I added those files there

help