pytorch-retinanet
pytorch-retinanet copied to clipboard
About freeze BatchNorm
Hi, I noticed one of issues before, your answer about freeze BN layer is about batchsize. My question is: According your code here. The function freeze_bn filter all BN layer and set eval mode. However, this operation can only freeze running mean and running var, the parameters of BN layer, weight and bias are still set requires_grad=Ture. Is that mean during training process, weight and bias of BN layers are still computed grad and updated? You only use running mean and running var from pre-trained model and keep them unchanged? I'm not sure whether is that make sense that freeze bn operation actually train the parameters of BN layer as well. Hoping you can answer my question. Thanks in advance. best
Is it about the batch size?
It is indeed. Because of the memory requirements, the batch size is typically quite low on a standard GPU. This means that batch normalization won't work very well (as it requires a larger batch size). Since the network is pre-trained on imagenet, freezing the weights is a good fix (this is the approach taken in faster-rcnn and others).
Originally posted by @yhenon in https://github.com/yhenon/pytorch-retinanet/issues/24#issuecomment-419752217
I have the same question and look forward to the answer.
Hi, I noticed one of issues before, your answer about freeze BN layer is about batchsize. My question is: According your code here. The function freeze_bn filter all BN layer and set eval mode. However, this operation can only freeze running mean and running var, the parameters of BN layer, weight and bias are still set requires_grad=Ture. Is that mean during training process, weight and bias of BN layers are still computed grad and updated? You only use running mean and running var from pre-trained model and keep them unchanged? I'm not sure whether is that make sense that freeze bn operation actually train the parameters of BN layer as well. Hoping you can answer my question. Thanks in advance. best
Is it about the batch size?
It is indeed. Because of the memory requirements, the batch size is typically quite low on a standard GPU. This means that batch normalization won't work very well (as it requires a larger batch size). Since the network is pre-trained on imagenet, freezing the weights is a good fix (this is the approach taken in faster-rcnn and others).
Originally posted by @yhenon in #24 (comment)
So if I have a powerful GPU and am able to train a relatively large batch of pictures(say, 32 or 64), I can unfreeze the bn layer then?
@WangKK1996 I think so. In theory, you can unfreeze the BN layer when you set a large batch size. But in practice it will not always work. You can try if you have such enviable experimental conditions.