annotated-transformer use biased estimate of std in layernorm as in the original paper

use biased estimate of std in layernorm as in the original paper

Open Arunprakash-A opened this issue 1 year ago • 0 comments

The original paper computes a biased estimate of sample standard deviation. However, by default, torch.Tensor.std() uses an unbiased estimate Ref. Therefore, it is necessary to use torch.Tensor.std(-1,unbiased=False). Moreover, the class nn.LayerNorm() uses biased estimate as well. Though it does not make much difference for large dim, following the definition given in the cited paper is more appropriate.

For PyTorch>=2.0, use torch.Tensor.std(-1,correction=0).

Jan 25 '24 17:01 Arunprakash-A

annotated-transformer annotated-transformer copied to clipboard

use biased estimate of std in layernorm as in the original paper

annotated-transformer
annotated-transformer copied to clipboard