Alex Spies
Results
1
comments of
Alex Spies
I would have expected the behaviour of `fold_ln=True` to be that the result is a `HookedTransformer` with properly folded layernorms, regardless of whether the `state_dict` being loaded already had these...