annotated-transformer
annotated-transformer copied to clipboard
use biased estimate of std in layernorm as in the original paper
The original paper computes a biased estimate of sample standard deviation. However, by default, torch.Tensor.std()
uses an unbiased estimate Ref. Therefore, it is necessary to use torch.Tensor.std(-1,unbiased=False)
. Moreover, the class nn.LayerNorm()
uses biased estimate as well. Though it does not make much difference for large dim, following the definition given in the cited paper is more appropriate.
For PyTorch>=2.0, use torch.Tensor.std(-1,correction=0)
.