Easy-Transformer
Easy-Transformer copied to clipboard
[Bug Report] some model weights are NaN when initializing
Describe the bug
initializing a HookedTransformer from a config sets some weights to nan. I do not see any clear pattern as to what weights are set, but it is usually not whole tensors but only parts of those tensors.
Code example
from transformer_lens import HookedTransformer, HookedTransformerConfig
model_trained = HookedTransformer.from_pretrained("gpt2-small")
model_cfg = model_trained.cfg
# randomize the weights of model
model = HookedTransformer(HookedTransformerConfig.from_dict(model_cfg.to_dict()))
for k, tensor in model.state_dict().items():
if tensor.isnan().any():
print(f"{k} has NaNs!")
print(f"{tensor.isnan().sum().item() = }")
# indicies of NaNs
print(tensor.isnan().nonzero(as_tuple=True))
results in
Loaded pretrained model gpt2-small into HookedTransformer
blocks.1.attn.W_O has NaNs!
tensor.isnan().sum().item() = 1808
(tensor([10, 10, 10, ..., 11, 11, 11], device='cuda:0'), tensor([42, 42, 42, ..., 63, 63, 63], device='cuda:0'), tensor([505, 507, 528, ..., 765, 766, 767], device='cuda:0'))
System Info
python: 3.10.16 (main, Feb 12 2025, 14:50:02) [Clang 19.1.6 ]
torch: 2.6.0+cu124
transformers: 4.50.0
transformer_lens: 2.15.0
- Installed with uv
- running Ubuntu 24.10
- tried across python 3.12 and 3.13 as well, same issue
- tried with device set to
"cpu", same issue
Checklist
- [x] I have checked that there is no similar issue in the repo (required)