[Bug Report] some model weights are NaN when initializing

Open mivanit opened this issue 9 months ago • 0 comments

Describe the bug initializing a HookedTransformer from a config sets some weights to nan. I do not see any clear pattern as to what weights are set, but it is usually not whole tensors but only parts of those tensors.

Code example

from transformer_lens import HookedTransformer, HookedTransformerConfig

model_trained = HookedTransformer.from_pretrained("gpt2-small")
model_cfg = model_trained.cfg

# randomize the weights of model
model = HookedTransformer(HookedTransformerConfig.from_dict(model_cfg.to_dict()))

for k, tensor in model.state_dict().items():
	if tensor.isnan().any():
		print(f"{k} has NaNs!")
		print(f"{tensor.isnan().sum().item() = }")
		# indicies of NaNs
		print(tensor.isnan().nonzero(as_tuple=True))

results in

Loaded pretrained model gpt2-small into HookedTransformer
blocks.1.attn.W_O has NaNs!
tensor.isnan().sum().item() = 1808
(tensor([10, 10, 10,  ..., 11, 11, 11], device='cuda:0'), tensor([42, 42, 42,  ..., 63, 63, 63], device='cuda:0'), tensor([505, 507, 528,  ..., 765, 766, 767], device='cuda:0'))

System Info

python: 3.10.16 (main, Feb 12 2025, 14:50:02) [Clang 19.1.6 ]
torch: 2.6.0+cu124
transformers: 4.50.0
transformer_lens:  2.15.0

Installed with uv
running Ubuntu 24.10
tried across python 3.12 and 3.13 as well, same issue
tried with device set to "cpu", same issue

Checklist

[x] I have checked that there is no similar issue in the repo (required)

Mar 27 '25 06:03 mivanit