rcg why the projector head MLP set to requires

why the projector head MLP set to requires_grad=False

Open woshixiaobai2019 opened this issue 1 year ago • 3 comments

I noticed that a new projector head MLP is added after loading the pre-trained MoCo v3 model. However, the parameters of this newly added component are also set to requires_grad=False.

My question is - since this MLP head is randomly initialized, why does it not require any training before being used for feature projection?

Intuitively, adding an untrained random projection head could disrupt the original feature distributions learned by the pre-trained encoder. So what is the motivation behind fixing the parameters of this newly added head?

Does it relate to better retaining the pre-trained feature distributions? Or leveraging the fixed random projections to improve generalization of the downstream tasks?

It will be great if someone could help explain the rationale behind not training the added projector head. Thanks!

Jan 01 '24 03:01 woshixiaobai2019

rcg rcg copied to clipboard

why the projector head MLP set to requires_grad=False

rcg
rcg copied to clipboard