ssd_keras icon indicating copy to clipboard operation
ssd_keras copied to clipboard

In training, why the loss is decreasing, while the val loss is increasing?

Open kitterive opened this issue 7 years ago • 11 comments

I user VOCtest_06-Nov-2007 dataset , first I user get_data_from_XML.py convert xml ground truth to VOC2007.pkl, and use it to train the network, In the training , I found the loss is decreasing, while the the val loss is increasing , is it overfit? train

kitterive avatar Jun 24 '17 22:06 kitterive

I'm observing the same phenomena. Is there some fix to this?

@kitterive - What initial weights are you using? Also how are you normalizing the coordinates?

meetps avatar Jun 29 '17 12:06 meetps

I also observe the same behavior. However, I was able to get a val loss of 1.4 after 20 epochs and afterwards the val loss started increasing.

oarriaga avatar Jun 29 '17 13:06 oarriaga

@oarriaga - In that case, which model weights did you use to finetune it VGG16 ( with top removed) or the caffe converted SSD weights ?

meetps avatar Jun 29 '17 13:06 meetps

I used the pre-trained weights provided in the README file, which I believe are the weights from an older original implementation in caffe.

oarriaga avatar Jun 29 '17 13:06 oarriaga

@oarriaga @rykov8 - Has anyone successfully tried to train the SSD from scratch ( i.e using only VGG16 weights) using this code ? If not then perhaps, it would be wise to rethink the loss function.

meetps avatar Jul 04 '17 10:07 meetps

Hi @meetshah1995
try to add BN layer after the Conv.. (the weight wouldn't be the best match but can be good start for train )

MicBA avatar Jul 04 '17 12:07 MicBA

I am seeing the same issue with training off of the MS COCO data set of images.

I was following the training example form SSD_training.ipynb

Kramins avatar Jul 04 '17 14:07 Kramins

@meetshah1995 I have trained SSD with only the VGG16 weights and it was overfiting after ~20 epochs my lowest validation loss was of 1.4. I believe that better results can be obtained from the correct implementation of the random_size_crop function in the data augmentation part. Also the architecture ported in the repository is not the newest model from the latest arxiv version and this might lead to significant differences between the implementation here presented and the other ones around such as the TF, pytorch and original caffe one.

oarriaga avatar Jul 04 '17 14:07 oarriaga

Hi, @oarriaga Can you show your training log? I want to know loss after 120k iterations. Thank you in advance!

ujsyehao avatar Mar 12 '18 03:03 ujsyehao

I am seeing the same issue while training my own datasets. is it overfit or not?

Hydrogenion avatar Jan 16 '19 08:01 Hydrogenion

My minimum loss is also around 1.39 ~ 1.4. would adding random_size_crop help?

jamebozo avatar Aug 01 '19 07:08 jamebozo