Easy-Transformer Make `Clean_Transformer_Demo.ipynb` (linked in README) compatible with the latest version of `TransformerLens`

Hi Neel,

I successfully ran the Clean_Transformer_Demo.ipynb on Google Colab. However, when I ran it on my laptop (Ubuntu 21.x Linux Box), there was an error in the GPT2 check after the Attention layer. Here is the error, which seems to imply that the forward method of the attention model of GPT2 requires two arguments. I don't understand how the results could be different on two systems, UNLESS GPT2 is different, which seems unlikely. Any ideas? Thanks.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/tmp/ipykernel_753165/3913931317.py](https://file+.vscode-resource.vscode-cdn.net/tmp/ipykernel_753165/3913931317.py) in ()
     64 
     65 rand_float_test(Attention, [2, 4, 768])
---> 66 load_gpt2_test(Attention, reference_gpt2.blocks[0].attn, cache["blocks.0.ln1.hook_normalized"])
     67 
     68 # In gpt2 requires: forward() missing 2 required positional arguments: 'key_input' and 'value_input'

[/tmp/ipykernel_753165/545637326.py](https://file+.vscode-resource.vscode-cdn.net/tmp/ipykernel_753165/545637326.py) in load_gpt2_test(cls, gpt2_layer, input_name, cache_dict)
     34     output = layer(reference_input)
     35     print("Output shape:", output.shape)
---> 36     reference_output = gpt2_layer(reference_input)
     37     print("Reference output shape:", reference_output.shape)
     38 

[~/src/2022/TransformerLens/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py](https://file+.vscode-resource.vscode-cdn.net/home/erlebach/src/2022/TransformerLens/clean_transformer/~/src/2022/TransformerLens/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py) in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() missing 2 required positional arguments: 'key_input' and 'value_input'

May 21 '23 10:05 erlebach

The google colab is using this older branch of the TransformerLens code: https://github.com/neelnanda-io/TransformerLens/tree/clean-transformer-demo (way back when the library was called EasyTransformer).

You should either use that branch locally, or not use the Clean_Transformer_Demo.ipynb. I think we should rename this issue "Add Clean_Transformer_Demo.ipynb to the current repo". How does that sounds?

May 21 '23 10:05 ArthurConmy

Thanks. Perhaps the do cumbersome should mention the branch that corresponds to the Clean_Transfirmer_Demo.ipynb? How could I have figured this out?

May 21 '23 12:05 erlebach

@erlebach the relevant line from the colab is

%pip install git+https://github.com/neelnanda-io/Easy-Transformer.git@clean-transformer-demo

this doesn't install the latest TransformerLens, but a specific branch.

I've made this an issue for the current milestone and changed the title to reflect the change I'd like to see.

May 21 '23 14:05 ArthurConmy

@ArthurConmy : Thank you. My bad. I should have paid more attention.

May 21 '23 15:05 erlebach

I installed the branch in a separate repository, and ran the EasyTransformer_Demo in the environment of TransformerLens. The code ran perfectly on the Ubuntu system. There was no error in the attention model. So I do not understand how that is possible, since the gpt2 model retrieved presumably came from within the virtual environment of TransformerLens. The Attention class was identical in the two transformer codes I ran. Conceptually, how can I get an error in one code and not the other assuming they are both using the same gpt2 model? Thanks for any advice.

May 21 '23 16:05 erlebach

Arthur added a PR that changes the internal implementation of the model, so attention layers have separate inputs for query, key and value. The testing code uses a particular layer of the model, and isn't adapted to use the new interface. It wouldn't be that hard to fix in the code

Possibly we should set it so that if ONLY query input is given, key and value are also set to that?

On Sun, 21 May 2023, 5:33 pm Gordon Erlebacher, @.***> wrote:

I installed the branch in a separate repository, and ran the EasyTransformer_Demo in the environment of TransformerLens. The code ran perfectly on the Ubuntu system. There was no error in the attention model. So I do not understand how that is possible, since the gpt2 model retrieved presumably came from within the virtual environment of TransformerLens. The Attention class was identical in the two transformer codes I ran. Conceptually, how can I get an error in one code and not the other assuming they are both using the same gpt2 model? Thanks for any advice.

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/283#issuecomment-1556220333, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKMHKO7FC4ULBOGY2V3XHI7WLANCNFSM6AAAAAAYJJ7IHI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

May 21 '23 18:05 neelnanda-io

Thanks. Yes, I saw the differences in the components/ folder. Perhaps include an updated version of the Easy_Transfirmer_Demo in the repository?

May 21 '23 19:05 erlebach

@erlebach Sorry about this. We have recently reviewed them all in an open PR and I'm trying to find people who can validate they are currently working as expected. Please continue to raise any issues if you have them and we'll make sure you're unblocked.

May 22 '23 08:05 jbloomAus

@erlebach can you please let me if you are blocked by this, otherwise I might close the issue in a couple of days.

May 22 '23 08:05 jbloomAus

Note that Callum McDougall has an updated version of the tutorial here: https://transformerlens-intro.streamlit.app/Transformer_from_scratch

On Mon, 22 May 2023 at 09:21, Joseph Bloom @.***> wrote:

@erlebach https://github.com/erlebach can you please let me if you are blocked by this, otherwise I might close the issue in a couple of days.

— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/283#issuecomment-1556761923, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKP4AES4BZBBVNHVRHDXHMOYRANCNFSM6AAAAAAYJJ7IHI . You are receiving this because you commented.Message ID: @.***>

May 22 '23 09:05 neelnanda-io

Hi, with your help, I am unblocked. I basically have to update the testing for the Attention component since the forward method now takes three identical arguments (for self-learning).

The link on Callum McDougall's page is what I was running with the main branch of TransformerLens and that will not work since his version of attention has a forward method with a single argument and TransformerLens has three aarguments. Therefore the test as written, using GPT2, fails. Of course it works if I used the correct branch. I just feel that you should include a link to a version of the clean Transformer that works properly with the current version of the components.py module in TransformerLens.

You can close the issue. Thanks for the help!

May 22 '23 19:05 erlebach

Easy-Transformer Easy-Transformer copied to clipboard

Make `Clean_Transformer_Demo.ipynb` (linked in README) compatible with the latest version of `TransformerLens`

Easy-Transformer
Easy-Transformer copied to clipboard