pytorch-ssd icon indicating copy to clipboard operation
pytorch-ssd copied to clipboard

SSD-VGG: Loss nan

Open hoangcuongbk80 opened this issue 4 years ago • 3 comments

Hi,

I trained the SSD for my own dataset using VGG and get loss nan as below:

88891938-4d04f100-d244-11ea-803f-a43f13565a04

I trained my own data on Jetson Xavier. It's ok if training using MobileNet. Any suggestion to make it work with VGG?

hoangcuongbk80 avatar Aug 01 '20 06:08 hoangcuongbk80

I had your same error, I completed the training using a lower learning rate i.e --lr 0.0001

olibartfast avatar Aug 29 '20 12:08 olibartfast

@francescooliva Thank you for sharing your experience. I'll try it.

hoangcuongbk80 avatar Aug 29 '20 14:08 hoangcuongbk80

Also I can see from your screenshot it looks like you are using mobilenet training parameters, or am I wrong? check in the Readme the Vgg model training instructions: python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net vgg16-ssd --base_net models/vgg16_reducedfc.pth --batch_size 24 --num_epochs 200 --scheduler "multi-step” —-milestones “120,160”

olibartfast avatar Aug 31 '20 12:08 olibartfast