pytorch-retinanet icon indicating copy to clipboard operation
pytorch-retinanet copied to clipboard

About freeze BatchNorm

Open lscelory opened this issue 5 years ago • 3 comments

Hi, I noticed one of issues before, your answer about freeze BN layer is about batchsize. My question is: According your code here. The function freeze_bn filter all BN layer and set eval mode. However, this operation can only freeze running mean and running var, the parameters of BN layer, weight and bias are still set requires_grad=Ture. Is that mean during training process, weight and bias of BN layers are still computed grad and updated? You only use running mean and running var from pre-trained model and keep them unchanged? I'm not sure whether is that make sense that freeze bn operation actually train the parameters of BN layer as well. Hoping you can answer my question. Thanks in advance. best

Is it about the batch size?

It is indeed. Because of the memory requirements, the batch size is typically quite low on a standard GPU. This means that batch normalization won't work very well (as it requires a larger batch size). Since the network is pre-trained on imagenet, freezing the weights is a good fix (this is the approach taken in faster-rcnn and others).

Originally posted by @yhenon in https://github.com/yhenon/pytorch-retinanet/issues/24#issuecomment-419752217

lscelory avatar Nov 21 '19 12:11 lscelory

I have the same question and look forward to the answer.

CanshangD avatar Dec 06 '19 03:12 CanshangD

Hi, I noticed one of issues before, your answer about freeze BN layer is about batchsize. My question is: According your code here. The function freeze_bn filter all BN layer and set eval mode. However, this operation can only freeze running mean and running var, the parameters of BN layer, weight and bias are still set requires_grad=Ture. Is that mean during training process, weight and bias of BN layers are still computed grad and updated? You only use running mean and running var from pre-trained model and keep them unchanged? I'm not sure whether is that make sense that freeze bn operation actually train the parameters of BN layer as well. Hoping you can answer my question. Thanks in advance. best

Is it about the batch size?

It is indeed. Because of the memory requirements, the batch size is typically quite low on a standard GPU. This means that batch normalization won't work very well (as it requires a larger batch size). Since the network is pre-trained on imagenet, freezing the weights is a good fix (this is the approach taken in faster-rcnn and others).

Originally posted by @yhenon in #24 (comment)

So if I have a powerful GPU and am able to train a relatively large batch of pictures(say, 32 or 64), I can unfreeze the bn layer then?

WangKK1996 avatar May 22 '20 08:05 WangKK1996

@WangKK1996 I think so. In theory, you can unfreeze the BN layer when you set a large batch size. But in practice it will not always work. You can try if you have such enviable experimental conditions.

lscelory avatar May 30 '20 03:05 lscelory