BlazeFace_Person.pytorch icon indicating copy to clipboard operation
BlazeFace_Person.pytorch copied to clipboard

Help in training

Open gd1925 opened this issue 3 years ago • 1 comments

Hi everyone,

I have been trying to train the model using the train_Blazeface.py but it seems the loss is always Nan for me. Could someone please guide me on it? Thank you.

gd1925 avatar Mar 29 '22 03:03 gd1925

Hi all, just an update, I was able to go until iteration 320 without nan by following steps:

  1. making the learning rate = 1e-7
  2. batch size = 16
  3. And adding following lines in ssd_model.py :
  • N = num_pos.data.sum().double()
  • loss_l = loss_l.double() / N
  • loss_c = loss_c.double() / N

This helped me get loss values for iterations less than 320 but as soon as I reach the further, the loss becomes nan. I have looked up all possible solutions but I am not able to understand further. If someone could help me by providing some leads, I would be really very grateful. Looking forward to any direction. Thank you.

gd1925 avatar Mar 29 '22 16:03 gd1925