Yeu-Tong Lau

Results 1 comments of Yeu-Tong Lau

Tried `from_pretrained_no_processing` and got the same results. It is more than the unembedding centering, the differences exist and get larger in each layer model activations. ```python def forward_with_cache(model, layer, inputs):...