pytorch-ssd icon indicating copy to clipboard operation
pytorch-ssd copied to clipboard

Sudden jump in loss from Epoch 0 to Epoch 1 when training OpenImages

Open hyl-g opened this issue 5 years ago • 3 comments

It happens when training on OpenImages (VOC is fine). The Loss jumps to 1000 or 10000 or above from epoch 0 to epoch 1. I traced down the problem to the function _getitem() in open_images.py in which the variable boxes is changed. Because python function arguments are passed by reference. The dataset object passed to _getitem() is changed when boxes is modified and causing the huge loss in epoch 1.

The solution is to duplicate a local copy of boxes to prevent unexpected changing of dataset. For completeness. the labels is duplicated too. See diff below:

diff_open_images.txt

hyl-g avatar Sep 12 '19 15:09 hyl-g

Thanks @hyl-g . That's a big bug indeed. Can make a pull request if it is not too much trouble? So I can merge it.

qfgaohao avatar Sep 13 '19 03:09 qfgaohao

Will do.

hyl-g avatar Sep 13 '19 15:09 hyl-g

I tried it on my own dataset and it works! before i was fixing the problem by removing the data augmentation: expand, random sample crop and random mirror.

Thank you for your contribution

faresbs avatar Sep 16 '19 14:09 faresbs