big_transfer icon indicating copy to clipboard operation
big_transfer copied to clipboard

What is the purpose of the zero_head parameter?

Open issamemariold opened this issue 5 years ago • 3 comments

https://github.com/google-research/big_transfer/blob/6c83d6459e009fa89d84c1e904611e9b162e6eff/bit_pytorch/models.py#L165

Hi there! I'm wondering what is the purpose of this zero_head parameter. It seems to me that if it is set to True then the weights of the head are initialized to zero, which causes the network to always output zeros for whatever input, and renders any further fine tuning of the model useless.

Should this be replaced with random initialization? Or maybe removed altogether, which lets PyTorch takes care of initializing the head?

issamemariold avatar May 26 '20 14:05 issamemariold

Hi. I was curious about it as well. I tried both approaches: zero init of head convolutional layer and let torch init it as usual. I reproduced few-shot example as in colab notebook. Both of the ways seems to learn reasonable weights and biases. But pytorch default init didn't achieve test result in range 78%-85%, only ~73%.

So, my guess that zero initialization is some sort of heuristic. Might be, it is easier (esp. for few-shot training) to learn weights for specific class features from zero in the first hand and do not try unlearning other weights and make them closer to zero.

Still, really interested how authors explain this. Thanks for your work!

ademyanchuk avatar Jun 03 '20 12:06 ademyanchuk

@ademyanchuk is correct, when doing any kind of training (such as fine-tuning), initializing the head to zero is common practice and stabilizes training. The OP statement of "and renders any further fine tuning of the model useless" is just wrong.

The only reason not to initialize it to zero is if you want to use our original pre-trained head, for example if you are interested in the ImageNet-21k class-space. Please see the colabs for examples of this.

lucasb-eyer avatar Jun 06 '20 11:06 lucasb-eyer

@lucasb-eyer So what do you think, in what cases I should not initialize the head with zeros? Can you think of a problem statement where I should not initialize with zeros?

abhiagwl4262 avatar Sep 17 '20 17:09 abhiagwl4262