Easy-Transformer
Easy-Transformer copied to clipboard
[Bug Report] Attention masking is not used by model forward methods
IIUC the attention_mask is overwritten in the code if you don't set start_at_layer argument:
https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L535-L546
this is also mentioned in the docstring: https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L511-L513
since it infers the pad attention mask from the tokens itself. In your case you dont have pad tokens, so the inferred attention mask has no effect.