keras-yolo2 icon indicating copy to clipboard operation
keras-yolo2 copied to clipboard

Warmup stage (Notebook vs. frontend.py)

Open letilessa opened this issue 6 years ago • 17 comments

I read several issues related to the warmup stage, but I am still confused. Also the code from the notebook differs from frontend.py

On the notebook there is only the WARM_UP_BATCHES variable that people are setting to 3 (#237), which seems to be the warmup_epochs from frontend.py, and the warmup_batches defined on frontend.py as shown below doesn't exist on the notebook.

self.warmup_batches = warmup_epochs * (train_times*len(train_generator) + valid_times*len(valid_generator))

The general advise to do the warmup is to first set it to 3 then set it to 0 to do the actual training (#46, #335). But I am not sure if I am doing it right on the notebook or how it works with that code.

What is happening is that on training the recall goes to zero after a while, and the predictions are giving very small values with poor fitting boxes.

image

letilessa avatar Aug 10 '18 11:08 letilessa

@letilessa I also don't understand very well how it works, but I found a problem in this repository, the training always save the best loss, but t the best loss doesnt means the best mAP, so I changed the code in order to save the best mAP as well, my models predictions improved after doing that, here is my fork. Also I don't use early stop

rodrigo2019 avatar Aug 10 '18 20:08 rodrigo2019

@rodrigo2019 What values did you use for the 4 scales? And what values did you get for mAP and loss using Mobilenet backend?

letilessa avatar Aug 11 '18 12:08 letilessa

4 scales? do you mean the anchors values? My work using this repository is focusing in car detection with high FPS on hardwares like raspberry. So I didn't used these backends, because it was too slow, even backends like mobilenet and tiny darknet, I designed my own network based on tiny darknet. I can't answer this question because I'am using a custom backend and a custom dataset, but I can say I'am getting really good predictions, my mAP is around 85-92 and 10-15fps on a hardware like raspberry.

rodrigo2019 avatar Aug 11 '18 22:08 rodrigo2019

4 scales?

I mean the object_scale, no_object_scale, coord_scale and class_scale. The default values on this repository are 5,1,1,1 respectively, but on the yolo paper it seems that he used no_object_scale=0,5 and coord_scale=5. I also saw @experiencor advising to play with the 4 scales in other issues #46. Did you change these values?

letilessa avatar Aug 13 '18 08:08 letilessa

image changing 1 to 3 helped the network to detect less false positives, but also make the network take more time to converge. I'am currently using this parameters

rodrigo2019 avatar Aug 13 '18 11:08 rodrigo2019

Hi Rodrigo, I used your mAP callback, but it makes training much slower and it is giving back zero for every epoch, not showing any improvement.

How are you doing the warmup stage? I used WARM_UP_BATCHES=3 for 50 epochs, then WARM_UP_BATCHES=0 for 100 epochs, but I am still getting recall zero and mAP zero.

letilessa avatar Aug 14 '18 08:08 letilessa

@letilessa unfortunately it can not be faster, because the callback process the whole validation dataset. I'am using warmup = 3, I set epochs around 2k. if you check # 291 you can see that have trainings that start to compute mAP bigger than 0 after 30 ~ 40 epochs, I already got a training that I got some results after 210 epochs. I also get good results after 12hours of training on a gtx1070. Could you tell me what are you trying to train and which configurations are you using? maybe I can help you.

rodrigo2019 avatar Aug 14 '18 16:08 rodrigo2019

I am training on pascal voc 2007+2012 similar to the yolo paper, but with mobilenet backend. Now I am trying to use repository code instead of the notebook, but when I load the weights from the warmup stage I get this error:

Traceback (most recent call last): File "train.py", line 116, in <module> _main_(args) File "train.py", line 92, in _main_ yolo.load_weights(config['train']['pretrained_weights']) File "/media/eHD/leticia/keras-yolo2/frontend.py", line 247, in load_weights self.model.load_weights(weight_path) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/network.py", line 1181, in load_weights f, self.layers, reshape=reshape) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 916, in load_weights_from_hdf5_group reshape=reshape) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 557, in preprocess_weights_for_loading weights = convert_nested_model(weights) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 545, in convert_nested_model original_backend=original_backend)) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 557, in preprocess_weights_for_loading weights = convert_nested_model(weights) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 533, in convert_nested_model original_backend=original_backend)) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/keras/engine/saving.py", line 675, in preprocess_weights_for_loading weights[0] = np.transpose(weights[0], (3, 2, 0, 1)) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 575, in transpose return _wrapfunc(a, 'transpose', axes) File "/home/letica/.conda/envs/cipa2/lib/python3.5/site-packages/numpy/core/fromnumeric.py", line 52, in _wrapfunc return getattr(obj, method)(*args, **kwds) ValueError: axes don't match array

letilessa avatar Aug 14 '18 17:08 letilessa

I would say that you are using old version of keras, but I'am not sure. (I'am using the version 2.1.5)

btw, look this comment, after 5 hours of training I started to get some results.

I will start a training with mobilenet today and give you a answer tomorrow.

rodrigo2019 avatar Aug 14 '18 18:08 rodrigo2019

@letilessa, there is my results after 10 epochs: image

config.zip I didn't used pre trained weights for mobileNet

rodrigo2019 avatar Aug 14 '18 21:08 rodrigo2019

Are you doing warmup together with training? I first ran the code with warmup_epochs=3 and nb_epochs=0 to do warmup then I ran with warmup_epochs=0 and nb_epochs=100 to train. The loss on warmup was around 11, but on training it became nan.

I see that you are using different anchors, where did you get these values? The workers and max_queue_size make any difference on training?

letilessa avatar Aug 15 '18 09:08 letilessa

@letilessa I'am runing the warmup in the same training. I generate the anchor using the gen_anchors.py script. The workers are how many threads do you have to pre process the batch generator, and the max_queue_size is how many pre processed batch can wait to enter in the training. Using good values for these parameters you can speed up your training, I found these values in a impiric way.

rodrigo2019 avatar Aug 15 '18 11:08 rodrigo2019

Hi @rodrigo2019, did you finish training with mobilenet? Can you tell me your email?

letilessa avatar Aug 17 '18 09:08 letilessa

@letilessa I stopped at epoch 10, because I already got some results, but I can do a full trainning if necessary. my email is [email protected]

rodrigo2019 avatar Aug 17 '18 11:08 rodrigo2019

@experiencor i have the same problem as @letilessa, i used the YOLO-step-by-step to train on my own dataset, it has five classes. The problem is that after a few epochs, current recall and total recall are reduced to 0. It seems that the notebook does not support the warm up. Any advice will be appreciated!

zenoZhao avatar Oct 10 '18 03:10 zenoZhao

@rodrigo2019 I am also trying to make a network for car detection. I like to know on what basis you made the custom backend. could you please share to me in my mail [email protected]

abhijithvnair94 avatar Dec 24 '18 08:12 abhijithvnair94

@letilessa Have you fix your problem? I always got Nan at the beginning of training even I changed the anchors and used the pre-trained weight. I used YOLOv2 as backend. Is it relative to the Warmup training? thanks in advance.

Aaron4Fun avatar Feb 18 '19 21:02 Aaron4Fun