Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code?

Open wangyifei0047 opened this issue 1 year ago • 2 comments

In LlamaModeling.py, the LlamaRMSNorm function outputs the weights * scaled hidden_states like below image

RMSNormPre definition in Transformer_lens: it seems that this function just outputs the scaled hidden_states

image

The way RMSNormPre by which Transformer_Block uses it seems that in the forward process in Transformer_Block, the weights of LlamaRMSNorm still not be added.

image

I want to hook the values after applying RMSNorm on each residual stream, so I try to find the parameters in RMSNorm and find something weird.

  • [yes] I have checked that there is no similar issue in the repo (required)

wangyifei0047 avatar Jul 06 '24 09:07 wangyifei0047

Have you tried comparing intermediate values using hooks? It may be the case that they folded into the weights of a subsequent layer.

4gatepylon avatar Sep 03 '24 19:09 4gatepylon

@wangyifei0047 I believe you are mistaken. In the current version (2.15.4), the normalization layer used at the start of each layer for Llama is RMS, not RMSPre as you pointed out. This is specified in the code below https://github.com/TransformerLensOrg/TransformerLens/blob/b5a16f849649a237cc02cc2c272ae4dc2085abe4/transformer_lens/loading_from_pretrained.py#L784

Thus, this is a non-issue.

nhatkhtn avatar Jun 11 '25 23:06 nhatkhtn