rcg
rcg copied to clipboard
why the projector head MLP set to requires_grad=False
I noticed that a new projector head MLP is added after loading the pre-trained MoCo v3 model. However, the parameters of this newly added component are also set to requires_grad=False.
My question is - since this MLP head is randomly initialized, why does it not require any training before being used for feature projection?
Intuitively, adding an untrained random projection head could disrupt the original feature distributions learned by the pre-trained encoder. So what is the motivation behind fixing the parameters of this newly added head?
Does it relate to better retaining the pre-trained feature distributions? Or leveraging the fixed random projections to improve generalization of the downstream tasks?
It will be great if someone could help explain the rationale behind not training the added projector head. Thanks!