a-PyTorch-Tutorial-to-Image-Captioning
a-PyTorch-Tutorial-to-Image-Captioning copied to clipboard
init_embedding
bias = np.sqrt(3.0 / embeddings.size(1))
torch.nn.init.uniform_(embeddings, -bias, bias)
and the pytorch init is init.normal_(self.weight)
why to do this and what is the refer?
look forward for discuss
bias = np.sqrt(3.0 / embeddings.size(1)) torch.nn.init.uniform_(embeddings, -bias, bias)
This is lecun_uniform way of initializing; here, the fan_in is the units of size emb_dim which is obtained using embeddins.size(1) as in the code.
The code samples (i.e. picks) value uniformly in the interval (-bias, +bias) where bias is defined as in the code sqrt(3.0 / emb_dim)
and the pytorch init is
init.normal_(self.weight)why to do this and what is the refer?
Well, there is a whole area of research about why some initializations are better when compared to just initializing by sampling values from a simple uniform or gaussian distribution. Some initializations are found to be empirically better such as lecun_uniform.
Here is one c.f.: initializers/lecun_uniform