darkflow icon indicating copy to clipboard operation
darkflow copied to clipboard

No ouput boxes after training !!

Open khorchefB opened this issue 8 years ago • 76 comments

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg. So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

  • epochs = 100
  • batch = 16
  • learning rate = 1e-5

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

khorchefB avatar Mar 09 '17 16:03 khorchefB

I see you are doing YOLOv2. How much is the loss? I suspect yours has not converged.

thtrieu avatar Mar 11 '17 04:03 thtrieu

@thtrieu Hi! I also train a two class YOLO v2 on my dataset, which has around 50000 images. I use the same setting as @kamelbouyacoub , and I trained with the pre-trained imagenet weights download from darknet website. At first, the loss decrease rapidly in around 10 epoches, then it stays around 1.8 ~ 2 and didn't decrease any more, the learning rate at 1e-6 for those epoches. I wonder how long it usually takes to converge? what's a normal loss like can give meaningful output? Could you kindly give some reasons or improvement suggestions? Thx!

hyzcn avatar Mar 11 '17 14:03 hyzcn

I retrained my graph for a second time, and here what it display to me after 800 iteration with learning rate equal to 1e-2

capture

Please, I need help, can you give me some advice for training.

Thanks

khorchefB avatar Mar 13 '17 04:03 khorchefB

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg. So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command ./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

If you want to work with 2 labels, then there are two modifications have to be made in .cfg: [region].classes = 2 and the last convolutional layer's filter number (should be 35 instead of 425).

Make sure you did the above, then please avoid training right away. First, train on a very small dataset (3~5 images) of both classes. Only when you successfully overfit this small dataset (an inexpensive end-to-end test for the whole system), then move on to training on your whole dataset.

If overfitting fails, I'll help you look into the details.

thtrieu avatar Mar 13 '17 04:03 thtrieu

@thtrieu Hi! I'm another poster with similar issues as mentioned in previous posts. I already change my class number to 2 classes and try to overfit the net with around 8 images, the loss can converge a bit lower but then it still get stuck around 1.6. I wonder these 3-5 images you mentioned is randomly drawn or there is any guidelines? Moreover, the loss of successful overfitting is around 0? Or any magnitude to indicate successful overfitting? I have been trapped for a few days and thanks in advance for your reply!!

hyzcn avatar Mar 13 '17 05:03 hyzcn

In my experiments, the overfitting loss can be around or smaller than 0.1. In the case of disabling noise augmentation, it can very well be near perfect 0.0.

3-5 images can be anything (randomly drawn from training set is possible), but preferably contains all of your classes (e.g. car and dogs, then 3-5 images should better have both of them instead of only one). Not being able to overfit such a small training set means the learning rate are too big; or there is bug in the code.

I recommend disabling noise augmentation during this overfit step by setting argument allobj = None in https://github.com/thtrieu/darkflow/blob/master/net/yolo/data.py#L69, setting learning rate smaller (say 1e-5) and try overfitting again.

thtrieu avatar Mar 13 '17 05:03 thtrieu

@thtrieu thanks for the information, I'll try on that! :+1:

hyzcn avatar Mar 13 '17 07:03 hyzcn

I am retraining yolov2 on VOC 2012 with 20 classes and did not change any parameter. Loss is now at 0.01 and still cannot see any bounding box after 7000 steps. Should I just keep training or is this the sign there is an issue?

andreapiso avatar Mar 14 '17 11:03 andreapiso

Have you looked at postprocess in net/yolo/test? There is a _tresh dict that may disrupt your output. I had to remove it to make it work

Dref360 avatar Mar 15 '17 13:03 Dref360

@Dref360 that dict is removed in newer versions, please update your code

@AndreaPisoni Please give the steps to reproduce your error.

thtrieu avatar Mar 16 '17 02:03 thtrieu

Hi I am trying to train YoloV2 on my different dataset. I have created an annotation file as per PASCAL VOC format. I am trying to identify shoes and bags in the images. As suggested by users ( @ryansun1900 , @y22ma, @thtrieu ) on this repo I used 3-5 images and annotations to train.

I used tiny-yolo-voc.weights and tiny-yolo-voc.cfg. I changed tiny-yolo-voc.cfg for the number of classes and the filters in the last convo layer, as 2 and 35 respectively.

I used a learning rate of 1e-3.

This is the command I used to train,

./flow --train --trainer momentum --model cfg/tiny-yolo-voc-2c.cfg --load bin/tiny-yolo-voc.weights --annotation <path/to/annotation> --dataset <path/to/sampledata> --gpu 0.4

After I ran 200 epochs I got NAN in loss and moving ave loss. I printed out the output matrices while training using

fetches = [self.train_op, loss_op, self.top.out, self.top.inp.out, self.top.inp.inp.out, self.top.inp.inp.inp.out]

I looked for matrices which had values in them and found some values around step 176, so I loaded that model and reran the training with a smaller learning rate= 1e-6. I finally managed to reduce the loss 4.600135803222656 - moving ave loss 4.5986261185381885. I tried to test using the ckpt with the following command,

./flow --test <path/to/test/> --model cfg/tiny-yolo-voc-2c.cfg --load 890

But the images do not have bounding boxes.

Can you please guide me . I am not sure if I have missed any step in between.

hemavakade avatar Mar 22 '17 06:03 hemavakade

I think you are doing fine. Just that the model has not converged. A trained voc model with 20 classes has loss around 4.5; so two classes should be significantly smaller than that.

And you are doing it with only 3-5 images, so I would say overfitting should be the case, i.e. loss << 1.0.

thtrieu avatar Mar 22 '17 06:03 thtrieu

@thtrieu, what do you suggest in that case.

I have also disabled noise augmentation during the over-fitting.

## Update: I could bring down the loss to almost 0.01. Had to use a a different optimizer; RMSPROP works better. But when I test, there are still no bounding boxes. This is the command I am using.

./flow --test <path/to/test/> --model cfg/tiny-yolo-voc-2c.cfg --load -1 --gpu 0.4

I checked the output of the box probabilities and they are very low, in the order of < 1e-3. For the purpose of testing if I am doing everything right, I used the same images I trained on as my test data and it did put the bounding boxes and the values of probabilities are also high around 0.9. Do you suggest training on a larger dataset using the overfit model?

hemavakade avatar Mar 22 '17 13:03 hemavakade

I have the same problem training on my own toy dataset with 2 classes model. Training process converges according to loss function decreasing, but draws nothing during testing. What I am doing wrong?

eugtanchik avatar Mar 23 '17 14:03 eugtanchik

Update: I got it working! I have bounding boxes. I used yolo.weights and yolo.cfg. I think this is trained on COCO dataset which is much better for the dataset and classes I am using.

hemavakade avatar Mar 23 '17 20:03 hemavakade

@hemavakade, Obviously, I have boxes with yolo.weights and yolo.cfg too. But I want it to work with my own dataset under darkflow to be able to make fine-tuning of the model further.

eugtanchik avatar Mar 23 '17 20:03 eugtanchik

@eugtanchik I am not sure I understood you. I loaded the yolo.weights but used it to overfit my dataset. Do you mean to say yolo.cfg and yolo.weights are not in yolo - v2?

hemavakade avatar Mar 23 '17 20:03 hemavakade

@hemavakade, I mean that yolo.weights are trained on darknet framework, or am I wrong? Sure, this is YOLOv2, but what about number of classes in your case? It is not clear for me what to do, if my classes are not included in COCO dataset. As I know, in darkflow there is not any way to get yolo.weights, only tensorflow model format or protobuf.

eugtanchik avatar Mar 23 '17 20:03 eugtanchik

@eugtanchik well I have more classes. I was trying to get it work with a small number of classes.

To train further for other classes, I will try the following options.

  • I will use this model checkpoint and train it again with other classes.
  • If the above is not working then there is a section on the YOLO website about using the pre trained imagenet weights. I will try working with that.

hemavakade avatar Mar 23 '17 20:03 hemavakade

@hemavakade, Maybe this is a good idea. But it must be the way to train any model from scratch without pre-training. It seems for me that there is some bug in the code. I have not found it yet.

eugtanchik avatar Mar 23 '17 20:03 eugtanchik

My problem was fixed by just more number of steps were finished, and I saw some detections. It works fine!

eugtanchik avatar Mar 25 '17 19:03 eugtanchik

solutions suggested here didn't solve my problem. I used pre-trained weights to train my model on a different dataset with fewer classes. During the training process, the loss decreased and converged at some point. Afterward, I tried testing to output model on both test and train dataset and in both cases, there are no bounding boxes.

Please advise!

dkarmon avatar Apr 08 '17 18:04 dkarmon

I am facing similar issue. I trained on own dataset with 3 classes using pre-trained imagenet model i.e. darknet19_448.23 for yolov2.. I do not see any bounding boxes. I am using default setting but is there any role of anchor box parameter that need to be updated depending on your data. Any help in this context would be very useful!

nattari avatar Apr 27 '17 10:04 nattari

Same issue.

Here are the steps I took: I copied tiny-yolo-voc.cfg file to yolo-new.cfg file. Although I am really looking for 6 classes, I am training for 20 since I could not figure out how to change the number of classes without causing tensors to be inappropriately sized. I was training from scratch and reached a loss of 0.6 or 0.7.

When testing with both the training set and testing set, there were no bounding boxes.

If someone could advise how to change from 20 classes to 6 classes, that would be appreciated as well.

denisli avatar Apr 30 '17 16:04 denisli

It worked for me. It is relatively easy in Yolov2 to change the config file to incorporate your data (no additional changes). You need to train for more iterations. Initially, I wasn't detecting any bounding but after training for 40k iterations, I finally could see detection though the result was poor (you need to tune anchors). I used pre-trained imagenet weights.

nattari avatar Apr 30 '17 19:04 nattari

I'll reopen this issue since a lot of people are complaining about it. However it is worth noting that different users have different experience while training. Some succeeded, some did not. Please share your experience here.

For me, the absolute thing to do before training is to overfit the network on 3-5 images from random weights. Only when you are able to obtain reasonable detection, proceed to train until convergence. Convergence is a tricky concept, in many cases, loss stop decreasing does not mean convergence.

In YOLOv2 output, there are x, y, w, h representing coordinates, c for objectness and a probability distribution over classes, repeat all for B boxes on S x S grid cells. Take all of these into account when calculating your expected convergence loss. I may develop some feature to help evaluating such value.

Only when you have a rough estimate of such value, then you are able to declare convergence with confidence. Otherwise, it is highly likely that your net simply got stuck.

thtrieu avatar May 25 '17 06:05 thtrieu

I did not mention this, but after running for 40,000 iterations, I got bounding boxes as well. Although I did run it for only a single class this time. It seems that it just takes more training to get bounding boxes.

denisli avatar May 25 '17 16:05 denisli

@denisli And did you notice any change in the error before observing that you already obtained bounding boxes? Or did you let it training a lot of iterations without any signal of working?

jasag avatar May 27 '17 10:05 jasag

@jasag Yes, good question. It pretty much changes from oscillates slowly from 1.0 to 1.4 now. It would have been like this at like 10,000 iterations as well, I think.

I might have jumped the gun when I said that you needed more iterations to see bounding boxes. I was training with 6 classes before. Now I am training on just a single class. That might have made a difference. Unfortunately, I didn't bother to check this time at 10,000 iterations and had already deleted those checkpoints, so do not know if it would have shown bounding boxes then.

I encourage that you try more iterations anyway. I will run with my original 6 classes and let everyone know how it goes in about a week or so.

denisli avatar May 27 '17 14:05 denisli

Here it is:

The task is to detect traffic lights into 6 classes: green, yellow, red, green left, yellow left, and red left. The bounding boxes are all quite small. I used basically the same configuration as from yolo-new.cfg, but changed it so that it would handle 6 classes instead of 20. The results are shown below.

step 1001 - loss 95.0399169922 - moving ave loss 95.0399169922
step 2001 - loss 69.532623291 - moving ave loss 69.532623291
step 3001 - loss 39.0838623047 - moving ave loss 39.0838623047
step 4001 - loss 17.3892593384 - moving ave loss 17.3892593384
step 5001 - loss 6.25845241547 - moving ave loss 6.25845241547
step 6001 - loss 2.90494060516 - moving ave loss 2.90494060516
step 7001 - loss 1.37821388245 - moving ave loss 1.37821388245
step 8001 - loss 1.00944340229 - moving ave loss 1.00944340229
step 9001 - loss 2.86516594887 - moving ave loss 2.86516594887
step 10001 - loss 6.4756526947 - moving ave loss 6.4756526947
step 11001 - loss 2.20858240128 - moving ave loss 2.20858240128
step 12001 - loss 2.07890105247 - moving ave loss 2.07890105247
step 13001 - loss 2.71276283264 - moving ave loss 2.71276283264
step 14001 - loss 3.06041097641 - moving ave loss 3.06041097641
step 15001 - loss 2.01983118057 - moving ave loss 2.01983118057
step 16001 - loss 3.11811351776 - moving ave loss 3.11811351776

I tested on some of these checkpoints:

  • 8000 lowest loss but no boxes!
  • 11000 no boxes
  • 12000 sees boxes but not very well
  • 13000 outputs boxes consistently even if it doesn't classify super well
  • 14000 more consistent box output and better classification

And it probably gets better from here on up to a certain point.

The conclusion is that lower loss does not necessarily mean that it will have bounding boxes. The more iterations you run it, the more likely you will get bounding boxes.

denisli avatar May 31 '17 16:05 denisli

I don't understand that, why sometimes at lower loss, the result is not good ? it's strange !!

khorchefB avatar Jun 01 '17 07:06 khorchefB

I comment a few weeks ago that I was not able to get bounding boxes on another issue. And I am realizing that I do not question the size of the images, which are high definition, in training and in prediction. How should I parameterize my model for the object recognition in images of this type? Because I suppose that there may be some kind of influence, or am I wrong?

jasag avatar Jun 01 '17 16:06 jasag

@jasag My dataset uses 640x480 .png files.

denisli avatar Jun 01 '17 17:06 denisli

I got this issue too. Trained on a whole dataset of 60000+ images with 2 classes using both finetuning and transfer learning. The loss went down very fast to ~0 after a few hundred iterations. Batch size is 8.

step 17795 - loss 3.470731542165595e-07 - moving ave loss 3.4845578592865537e-07
step 17796 - loss 3.454373427302926e-07 - moving ave loss 3.4815394160881907e-07
step 17797 - loss 3.467374085630581e-07 - moving ave loss 3.48012288304243e-07
step 17798 - loss 3.46648846516473e-07 - moving ave loss 3.47875944125466e-07

However, when I test the model on both the train and test data, there are no bounding boxes. Please advise. The command I used to run test is: flow --model cfg/yolo-idot.cfg --load -1 --gpu 0.0 --imgdir /home/nhat/IDOT-convert/IDOT_dataset/train/frames --labels idot-labels.txt Another thing that may be relevant is that my training set comes a from a video so there are many duplications in the training set. Edit: It seems people are having similar issue at #142 too.

minhnhat93 avatar Jul 10 '17 05:07 minhnhat93

We have to set the minimum score threshold in order to see the bounding box.

abhiishekpal avatar Jan 06 '18 07:01 abhiishekpal

@hemavakade and @denisli could you please guide me on how you train your own dataset and got the bounding box in test.Let me know if you can share your view on the below screen. screenshot from 2018-01-25 15-20-42 Thanks in advance.

JaySinghh avatar Jan 25 '18 09:01 JaySinghh

@JaySinghh hey, are you using darknet? As far as I know darkflow doesn't provide the above terminal information flow. If it is darkflow, can you share how you managed it? Cuz I need to learn about IOU and recall rate.

onurbarut avatar Jan 28 '18 18:01 onurbarut

I used yolov3 pre-trained model to train my own dataset, I found it can detect the target before 900 iteration, but it cannot detect any target after 10000 iterations including the final weight, do you know what should I do? should I change the training rate when it is training?

SamNew1 avatar Apr 27 '18 10:04 SamNew1

Kindly please any one please.. who can give me the links of correct Yolov2-voc.cfg and its corresponding weights file i start training with the ine downloaded from offical yolo site but i got this error.. help me please thanks in advance........This is error

C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\dark\darknet.py:54: UserWarning: ./cfg/yolov2-voc.cfg not found, use cfg/yolo2-voc-1c.cfg instead cfg_path, FLAGS.model)) Parsing cfg/yolo2-voc-1c.cfg Loading bin/yolov2-voc.weights ... Traceback (most recent call last): File "flow", line 6, in cliHandler(sys.argv) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\cli.py", line 26, in cliHandler tfnet = TFNet(FLAGS) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\build.py", line 58, in init darknet = Darknet(FLAGS) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\dark\darknet.py", line 27, in init self.load_weights() File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\dark\darknet.py", line 82, in load_weights wgts_loader = loader.create_loader(*args) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\utils\loader.py", line 105, in create_loader return load_type(path, cfg) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\utils\loader.py", line 19, in init self.load(*args) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\utils\loader.py", line 77, in load walker.offset, walker.size) AssertionError: expect 202314760 bytes, found 202704264

sharoseali avatar May 19 '18 20:05 sharoseali

@sharoseali It is because some .weights files on the official darknet website has been updated. The offset about how to read the file has been changed. To fix it ,i strongly recommend that u should use the yolo.cfg and yolo.weights together which is trained by coco dataset. Or use the tiny-yolo-voc.cfg together with tiny-yolo-voc.weights trained by voc2007. These two sets are correct by my own test. (PS: my yolo above means your yolov2 model) Otherwise, u should change the source code about the .py which reads and analyse the .weights file. Change the offset 16 to 20 (or 20 to 16).But it may cause unknown side effects.

youyuge34 avatar May 20 '18 07:05 youyuge34

yes the file exists .. this is the issue the even a file exists it show me this message however i give full path and darkflow respond me like this

WARNING:tensorflow:From C:\Users\MIPRG-P2\Anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version. Instructions for updating: Use the retry module or similar alternatives. Parsing C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\cfg\tiny-yolo-voc-1c.cfg Traceback (most recent call last): File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\realtimeDetection.py", line 12, in tfnet = TFNet(options) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\build.py", line 64, in init self.framework = create_framework(*args) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\framework.py", line 59, in create_framework return this(meta, FLAGS) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\framework.py", line 15, in init self.constructor(meta, FLAGS) File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\yolo_init_.py", line 20, in constructor misc.labels(meta, FLAGS) #We're not loading from a .pb so we do need to load the labels File "C:\Users\MIPRG-P2\Desktop\dark\darkflow-master\darkflow\net\yolo\misc.py", line 36, in labels with open(file, 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'labels.txt' Loading None ... Finished in 0.0s [Finished in 5.002s]

sharoseali avatar May 20 '18 07:05 sharoseali

youyuge34 thanks for replying.. youu mention yolo.cfg ang yolo.weights file where can i get this ? from old yolo site and what type of results can i expect from this model

Actually.. i want to train with yolo v2 or yolo v3 how can i do that... may be i an shift to linux ?? wts your opinion??

sharoseali avatar May 20 '18 07:05 sharoseali

@sharoseali

FileNotFoundError: [Errno 2] No such file or directory: 'labels.txt' 

If u train your own .cfg, then u should mannully add the 'label.txt' file in the root dir. Otherwise, ff u review the source code in darkflow-master\darkflow\net\yolo\misc.py, u will find that if you use the origin .cfg file, it will auto load the cfg/coco.names as label.txt. So u can simply add your own cfg name into the list in the misc.py. Read the source code, feel free to change it.

youyuge34 avatar May 20 '18 08:05 youyuge34

@sharoseali It looks like u r not familiar with this project. The yolo in this project Darkflow/cfg means yolov2. And Darkflow/cfg/v1 means to yolov1. The .cfg files are all exist originally. The .weights file can be get at the darknet official website: https://pjreddie.com/darknet/yolov2/

In detail, the yolo.weights Refer to this one: YOLOv2 608x608 | COCO trainval the tiny-yolo weights i mentioned above means this one: Tiny YOLO | VOC 2007+2012

youyuge34 avatar May 20 '18 08:05 youyuge34

youyuge34 you mention to edit the list . kindly please mention which list u are asking for.?

this one 1 labels20 = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

or this 2- voc_models = ['yolo-full', 'yolo-tiny', 'yolo-small', # <- v1 'yolov1', 'tiny-yolov1', # <- v1.1 'tiny-yolo-voc', 'yolo-voc']
thanks again

sharoseali avatar May 20 '18 08:05 sharoseali

okay .. i will try the files you mention .. i thought that i tried this one in the past YOLOv2 608x608 but again it miss matched with cfg...... however i will try again .. please reply to the question regarding which list in misc.py you are asking for??
thanks youyuge34 for your help... expecting more help from u...

sharoseali avatar May 20 '18 08:05 sharoseali

@sharoseali if u r training your own dataset, u must add your own 'labels.txt'. If trainging voc,add your cfg name into the 'voc_models' list.

youyuge34 avatar May 20 '18 08:05 youyuge34

yes i am training my own dataset with only 1 class .. i have add my label name in 'labels.txt' file in darkflow-master folder and mention my cfg file in misc.py but getting same thing can i add my label name in this one ?? labels20 = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

sharoseali avatar May 20 '18 08:05 sharoseali

@sharoseali It feels like that i confused u. Just add the label.txt and leave the source code unmodified.

youyuge34 avatar May 20 '18 08:05 youyuge34

where i can add labels.txt??

sharoseali avatar May 20 '18 08:05 sharoseali

@sharoseali Just follow the main site of Darkflow, at the root dir.

youyuge34 avatar May 20 '18 14:05 youyuge34

youyuge34............ i have checked the weights file and its corresponding cfg file .. they are giving the same error.. even yolov2-tiny-voc are also not working with their cfg..... Joseph redmon must be informed about these issues.......... Any how i am going to start training again on tiny- voc which i previously trained . lets see how it behaves this time.....

youyuge34 can u h play with darknet on Linux and coco data-set?? .. if yes what was your experience.

i have 2000 xml files in voc format .. .. I am thinking to convert them in coco format.. but i dont know how to train the data using coco in windows....

sharoseali avatar May 20 '18 16:05 sharoseali

I face the same problem But I reduced the threshold to 0.0001 and I see many bounding boxes. so try to reduce the threshold and see your confidence

mohamedabdallah1996 avatar May 24 '18 21:05 mohamedabdallah1996

@thtrieu I reached to loss ~1.6 with training on 32 classes But the confidence for all the objects is still 0.0 that mean that the model didn't learn anything. How can I reduce the loss much more in order to get more confidence. I changed the batch size and learning rate but the loss still in the same range!

I need your help please! thanks in advance

mohamedabdallah1996 avatar May 24 '18 22:05 mohamedabdallah1996

How can I change the number of iterations I am doing it with 1500 images divided into six classes are there anyways to change number of iteration?

Dhagash4 avatar Jun 04 '18 06:06 Dhagash4

@denisli Can you show me the method to increase iteration at step 554 only I got a loss of 5.34 and I am training 1500 images for 6 classes is that enough or should I increase my dataset.

Dhagash4 avatar Jun 04 '18 09:06 Dhagash4

increase the epochs size... if you havent a large dataset ..yoi can increase the epochs size.... however..... if you need a better ..trained model... you must have at least 300 to 500 images per class. hence for this figure .. you can set epochs number to 1000

On Mon, Jun 4, 2018, 2:27 PM Dhagash4 [email protected] wrote:

@denisli https://github.com/denisli Can you show me the method to increase iteration at step 554 only I got a loss of 5.34 and I am training 1500 images for 6 classes is that enough or should I increase my dataset.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394291583, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8jLTOQsN-1WJh5_afkAjH2cbzBdDks5t5P1xgaJpZM4MYRwc .

sharoseali avatar Jun 04 '18 18:06 sharoseali

@sharoseali Now I will be trying class by class I have 1000 images for that class lets see if I can get the bounding box with epoch 1000. Thank you for guiding me. I will let you know the result.

Dhagash4 avatar Jun 05 '18 05:06 Dhagash4

okay.. thats also the way to do this....what weights you are using.... .tiny yolo or other......?? let me know ...

On Tue, Jun 5, 2018, 10:12 AM Dhagash4 [email protected] wrote:

@sharoseali https://github.com/sharoseali Now I will be trying class by class I have 1000 images for that class lets see if I can get the bounding box with epoch 1000. Thank you for guiding me. I will let you know the result.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394583846, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8t0jkvXhcjxlGxkoKhksU6txLhz6ks5t5hNSgaJpZM4MYRwc .

sharoseali avatar Jun 05 '18 11:06 sharoseali

I am training two classes with 945 image for one class and 405 for another I am using tiny-yolo-voc weights currently should I change the weights?

Dhagash4 avatar Jun 05 '18 11:06 Dhagash4

No , i was only asking to let me know..about weights. i used tiny yolo voc.... for training.. but i got error when testing testing my model.... i tried for other weights like yolo- voc weights ... but they were no matching with their corresponding cfg file...

so ... now i am looking for cfg and weights file which can match ..each other and train my model... anyhow ... you continue to train your model with more epochs and share....your results . ..best of luck....

On Tue, Jun 5, 2018, 4:46 PM Dhagash4 [email protected] wrote:

I am training two classes with 945 image for one class and 405 for another I am using tiny-yolo-voc weights currently should I change the weights?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-394679449, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8kdPb9i_OWnjUMFjXJxZrcsfG7prks5t5m-dgaJpZM4MYRwc .

sharoseali avatar Jun 05 '18 11:06 sharoseali

@sharoseali I got the bounding boxes after 5000 steps but the problem is when I downloaded a image from google and tested it was not detecting it. How can I solve that problem is it overfitting problem. Also it was not labelling it like stop sign its just getting bounding boxes nothing written on it which is it and all. Also not detecting anything in the video what to do anybody..... I am doing the training with LISA extension dataset from VIVA website

Dhagash4 avatar Jun 07 '18 10:06 Dhagash4

Dhagash4 .. I leave this work for some time after i got error .. and was busy in other work .. In coming days i will start again.......... have u accomplished................???

On Thu, Jun 7, 2018 at 3:48 PM Dhagash4 [email protected] wrote:

@sharoseali https://github.com/sharoseali I got the bounding boxes after 5000 steps but the problem is when I downloaded a image from google and tested it was not detecting it. How can I solve that problem is it overfitting problem....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/80#issuecomment-395377473, or mute the thread https://github.com/notifications/unsubscribe-auth/AS8u8tF7w8PR0YI90DvXxEalqZeFL-ykks5t6QT8gaJpZM4MYRwc .

sharoseali avatar Jun 09 '18 15:06 sharoseali

Start Command:

HDF5_DISABLE_VERSION_CHECK=2 nohup ./flow --model cfg/tiny-yolo-v2-aviator.cfg --load bin/tiny-yolo-v2.weights --train --annotation /home/ubuntu/model/labels --dataset /home/ubuntu/model/aviators --epoch 10 --batch 8 --savepb True --load 18250 --gpu 0.9 &

Dataset:

~/model/labels$  ls -1 | wc -l
187
~/model/aviators$ ls | wc -l
187

Loss

Finish 996 epoch(es)
step 22909 - loss 0.5680124759674072 - moving ave loss 0.582944897404057
step 22910 - loss 1.782407283782959 - moving ave loss 0.7028911360419472
step 22911 - loss 0.20126786828041077 - moving ave loss 0.6527288092657936
step 22912 - loss 0.4742392301559448 - moving ave loss 0.6348798513548087
step 22913 - loss 0.3661291003227234 - moving ave loss 0.6080047762516002
step 22914 - loss 0.6089756488800049 - moving ave loss 0.6081018635144406
step 22915 - loss 0.4250970184803009 - moving ave loss 0.5898013790110266
step 22916 - loss 0.6636741161346436 - moving ave loss 0.5971886527233883
step 22917 - loss 0.3915417194366455 - moving ave loss 0.576623959394714
step 22918 - loss 0.17965593934059143 - moving ave loss 0.5369271573893017
step 22919 - loss 0.31156492233276367 - moving ave loss 0.514390933883648
step 22920 - loss 0.6093173623085022 - moving ave loss 0.5238835767261334
step 22921 - loss 0.49582234025001526 - moving ave loss 0.5210774530785216
step 22922 - loss 0.6295650601387024 - moving ave loss 0.5319262137845396
step 22923 - loss 0.39114269614219666 - moving ave loss 0.5178478620203054
step 22924 - loss 0.5364546775817871 - moving ave loss 0.5197085435764536
step 22925 - loss 0.46883073449134827 - moving ave loss 0.514620762667943
step 22926 - loss 0.6072037220001221 - moving ave loss 0.5238790586011609
step 22927 - loss 0.3584549129009247 - moving ave loss 0.5073366440311373
step 22928 - loss 0.7908065319061279 - moving ave loss 0.5356836328186364
step 22929 - loss 0.48035216331481934 - moving ave loss 0.5301504858682546
step 22930 - loss 0.3851150870323181 - moving ave loss 0.515646945984661
step 22931 - loss 1.296918511390686 - moving ave loss 0.5937741025252635
Finish 997 epoch(es)

Config:

(more above this line)
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6

What loss is typical of "convergence"? I ran 1000 epochs (22k + steps!) which resulted in very very low loss ~0.1% However my bounding boxes were only drawn around images the model had previously seen (IE they are part of the training set) - I suspect either my training set of data isn't large enough or that the model is WAY overfit and it will only match images it has already seen.

  1. What is the difference between an Epoch and a Step? I notice many people reference steps (and their relationship to checkpoint see @denisli above.)
  2. Whats an acceptable amount of training images to use? I believe I have 200
  3. At what loss does "convergence" typically take place? Are you talking Epochs or "Steps"?
  4. Does this library divide my training images into Test, Train and Verify directories? How can I fight against overfitting?
  5. How are you determining overfitting? It is simply a loss < 0.1?
  6. How long should training take? This trains for a day on AWS with GPUs and its getting expensive!

fogonthedowns avatar Sep 03 '18 02:09 fogonthedowns

It worked for me. It is relatively easy in Yolov2 to change the config file to incorporate your data (no additional changes). You need to train for more iterations. Initially, I wasn't detecting any bounding but after training for 40k iterations, I finally could see detection though the result was poor (you need to tune anchors). I used pre-trained imagenet weights.

Can you share your code?

aaronhan92 avatar Sep 30 '18 15:09 aaronhan92

hey everyone I have the same issue after running through every steps of this page and training data in pascal voc; no object is detected. I changed threshold to 0 and some object has been detected but they are not really useful. what should I do??

AzadeAlizade avatar Nov 04 '18 14:11 AzadeAlizade

Hello,

Currently I am trying to train the yolo.cfg (version 2) with 2 labels. (I want to recognise Dark Vador and Yoda in my test images.) I changed the number of classes in yolo.cfg, and I renamed it yolo-5C.cfg. So I put 2 labels in labels.txt, I created the annotation files, and finally I started the training using CPU with this command:

./flow --model/yolo-5C.cfg --load bin/yolo.weights --dataset pascal/VOCdevkit/IMG --annotation pascal/VOCdevkit/ANN --train --trainer adam

I changed the following parameters in the file flow.py:

  • epochs = 100
  • batch = 16
  • learning rate = 1e-5

There are 120 images (40 images with only Dark Vador ,40 images with only Yoda and 40 images with both of them) and 120 annotations

My problem is that after 12 hours of training on cpu, and after having started the test with the --test argument, it displays NO BOXES in the output images. But when I decrease the threshold to 0.00001, it displays many boxes. I want to understand how can I improve my training to have correct object detections. Can you give me please some advices.

Thanks.

Hi sir, Iam training darknet using yolov3. I have trained 200 images and I can see the label but no bounding boxes around them.Can I know what is the reason?

ManasaNadimpalli avatar Nov 15 '18 15:11 ManasaNadimpalli

I am testing an image using the methos " Using darkflow from another python application" in spyder IDE. my program run well at last I get empty array with no prediction. what to do now? gitpic

RamShankarKumar avatar Apr 14 '19 06:04 RamShankarKumar

Hi,

I am also facing the same issue. My model is not able to detect the bounding box. When I set the threshold to 0.00001, it is showing up too many boxes.

@ManasaNadimpalli Are you able to find out any solutions?

Please give some suggestions. I modified the .cfg file according to my class(# classes =1)

ridhimagarg avatar Apr 23 '19 05:04 ridhimagarg

@kamelbouyacoub como haces para disminuir el umbral y que te muestre muchos cuadros? Ayudame con eso porfavor

Alex0795 avatar Apr 29 '19 20:04 Alex0795

I has the same problem with not getting the bounding boxes. I trained on 87 images for one class. I decreased the learning rate to 1e-5 and I was able to get the correct bounding boxes, although not very high confidence(~20%) Hope this helps !!

aseembh2001 avatar Jul 29 '19 17:07 aseembh2001

I see you are doing YOLOv2. How much is the loss? I suspect yours has not converged.

I am also facing the same issue as @kamelbouyacoub
My loss after 1000 epochs is at 61.5860000

Finish 986 epoch(es)
step 1973 - loss 62.078346252441406 - moving ave loss 62.335486664291444
step 1974 - loss 62.121891021728516 - moving ave loss 62.31412710003515
Finish 987 epoch(es)
step 1975 - loss 62.219764709472656 - moving ave loss 62.30469086097891
step 1976 - loss 61.881935119628906 - moving ave loss 62.26241528684391
Finish 988 epoch(es)
step 1977 - loss 62.222434997558594 - moving ave loss 62.25841725791538
step 1978 - loss 61.85980224609375 - moving ave loss 62.21855575673322
Finish 989 epoch(es)
step 1979 - loss 62.035133361816406 - moving ave loss 62.20021351724154
step 1980 - loss 61.879722595214844 - moving ave loss 62.168164425038874
Finish 990 epoch(es)
step 1981 - loss 61.71182632446289 - moving ave loss 62.12253061498128
step 1982 - loss 61.67131042480469 - moving ave loss 62.077408595963625
Finish 991 epoch(es)
step 1983 - loss 61.771820068359375 - moving ave loss 62.0468497432032
step 1984 - loss 61.894561767578125 - moving ave loss 62.0316209456407
Finish 992 epoch(es)
step 1985 - loss 61.739654541015625 - moving ave loss 62.00242430517819
step 1986 - loss 61.7847900390625 - moving ave loss 61.980660878566624
Finish 993 epoch(es)
step 1987 - loss 61.47736740112305 - moving ave loss 61.93033153082227
step 1988 - loss 61.691654205322266 - moving ave loss 61.90646379827227
Finish 994 epoch(es)
step 1989 - loss 61.599735260009766 - moving ave loss 61.87579094444602
step 1990 - loss 61.71918487548828 - moving ave loss 61.860130337550245
Finish 995 epoch(es)
step 1991 - loss 61.71525573730469 - moving ave loss 61.84564287752569
step 1992 - loss 61.526390075683594 - moving ave loss 61.81371759734149
Finish 996 epoch(es)
step 1993 - loss 61.45462417602539 - moving ave loss 61.77780825520988
step 1994 - loss 61.457122802734375 - moving ave loss 61.74573970996233
Finish 997 epoch(es)
step 1995 - loss 61.439453125 - moving ave loss 61.715111051466096
step 1996 - loss 61.43961715698242 - moving ave loss 61.68756166201773
Finish 998 epoch(es)
step 1997 - loss 61.436065673828125 - moving ave loss 61.662412063198765
step 1998 - loss 61.47761535644531 - moving ave loss 61.643932392523425
Finish 999 epoch(es)
step 1999 - loss 61.33710479736328 - moving ave loss 61.61324963300741
step 2000 - loss 61.340763092041016 - moving ave loss 61.586000978910775
Checkpoint at step 2000
Finish 1000 epoch(es)
Training finished, exit.

Does this mean it is not converging?

absognety avatar Sep 17 '19 10:09 absognety

Have the same issue: loss < 0.45 after 15k steps, 1000+ images for each class. Tried overfitting with 20 images - it was fine. Using tiny-yolo-voc cfg and weights. Is there any solutions?(

slntopp avatar Feb 27 '20 12:02 slntopp

I faced the problem too. No bounding box at all. Any solution?

xinyee1997 avatar Mar 19 '20 18:03 xinyee1997

@thtrieu When I tried training Yolov2 with only PascalVoc2012 Car labeled data, I get 0.000007 loss.Although,I cannot see any bounding boxes when ı tested on image.Is that means over fitting?How can I solve that???

ozanpkr avatar Apr 01 '20 11:04 ozanpkr

the same here

ludwikbukowski avatar Jun 21 '20 00:06 ludwikbukowski