pytorch-transformer
pytorch-transformer copied to clipboard
Fix Layernorm Implementation
According to the formula norm = (x - mean) / sqrt(var + eps) not (x - mean)/(std + eps) sqrt(var + eps) == sqrt(std**2 + eps) != (std + eps) Though the difference may be small, it is not a strictly correct implementation of LayerNorm.