mxnet-SSH Some parameters are confused for training wider face

Thanks for your code, but I am confused about some parameters during training wider face. In the code,

end_epoch=10000, lr_steps=[55,68,80]; I am not sure it is correct or not for training wider face, because it means the last 9920 epochs are using lr of 0.0.000004?
opt = optimizer.SGD(learning_rate=lr, momentum=0.9, wd=0.0005, rescale_grad=1.0/len(ctx), clip_gradient=None), the rescale_grad should be 1.0/len(ctx)/batch_size?
During training, it fixed parameters of conv1, conv2, conv2, upsampling, it is correct?
How long you train the wider face dataset? how many gpus? what is the dataset? Any advice will be appreciated. thanks.

Oct 16 '18 01:10 hdjsjyl

I see the parameter 'color_jitter' for data augmentation, it is not used. correct?

Oct 16 '18 01:10 hdjsjyl

It will end at epoch-80. Softmax already did per-instance normalization. Fixed parameters are correct. Color_jitter is required.

Oct 16 '18 05:10 nttstar

@nttstar Thanks for your reply. This information is important to me. Thanks. Other questions:

I found that the program runs fast. And when I run the code with not good cpu and gpu, the gpu utility is always near to 100%. It is very good, can you explain it?
Did you use focal loss to train the wider face? How is the result? This is an excellent work, thank you very much.

Oct 17 '18 00:10 hdjsjyl