pytorch-ssd
pytorch-ssd copied to clipboard
Sudden jump in loss from Epoch 0 to Epoch 1 when training OpenImages
It happens when training on OpenImages (VOC is fine). The Loss jumps to 1000 or 10000 or above from epoch 0 to epoch 1. I traced down the problem to the function _getitem() in open_images.py in which the variable boxes is changed. Because python function arguments are passed by reference. The dataset object passed to _getitem() is changed when boxes is modified and causing the huge loss in epoch 1.
The solution is to duplicate a local copy of boxes to prevent unexpected changing of dataset. For completeness. the labels is duplicated too. See diff below:
Thanks @hyl-g . That's a big bug indeed. Can make a pull request if it is not too much trouble? So I can merge it.
Will do.
I tried it on my own dataset and it works! before i was fixing the problem by removing the data augmentation: expand, random sample crop and random mirror.
Thank you for your contribution