[Bug Report] Attention masking is not used by model forward methods

Open jmsdao opened this issue 2 years ago • 1 comments

Dec 20 '23 01:12 jmsdao

IIUC the attention_mask is overwritten in the code if you don't set start_at_layer argument:

https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L535-L546

this is also mentioned in the docstring: https://github.com/neelnanda-io/TransformerLens/blob/main/transformer_lens/HookedTransformer.py#L511-L513

since it infers the pad attention mask from the tokens itself. In your case you dont have pad tokens, so the inferred attention mask has no effect.

Feb 09 '24 17:02 uralik