Yeu-Tong Lau
Results
1
comments of
Yeu-Tong Lau
Tried `from_pretrained_no_processing` and got the same results. It is more than the unembedding centering, the differences exist and get larger in each layer model activations. ```python def forward_with_cache(model, layer, inputs):...