pytorch-retinanet
pytorch-retinanet copied to clipboard
Why do you freeze batch norm parameters when training?
would it be better to let batch norm parameters adapt to your current data?
it's a common practice.First, Because the pretrain network's bn layers have been trained. Second,Object Detection 's batchsize is small, hard to make bn parameter stable.
use Group norm instead of batch norm . it is more stable.
Use synchronized batch normalization
Use synchronized batch normalization
Using sync batch norm does not help with single GPU training and low batch sizes though.