Easy-Transformer
Easy-Transformer copied to clipboard
Make `Clean_Transformer_Demo.ipynb` (linked in README) compatible with the latest version of `TransformerLens`
Hi Neel,
I successfully ran the Clean_Transformer_Demo.ipynb
on Google Colab. However, when I ran it on my laptop (Ubuntu 21.x Linux Box), there was an error in the GPT2 check after the Attention layer. Here is the error, which seems to imply that the forward
method of the attention model of GPT2 requires two arguments. I don't understand how the results could be different on two systems, UNLESS GPT2 is different, which seems unlikely. Any ideas? Thanks.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/tmp/ipykernel_753165/3913931317.py](https://file+.vscode-resource.vscode-cdn.net/tmp/ipykernel_753165/3913931317.py) in ()
64
65 rand_float_test(Attention, [2, 4, 768])
---> 66 load_gpt2_test(Attention, reference_gpt2.blocks[0].attn, cache["blocks.0.ln1.hook_normalized"])
67
68 # In gpt2 requires: forward() missing 2 required positional arguments: 'key_input' and 'value_input'
[/tmp/ipykernel_753165/545637326.py](https://file+.vscode-resource.vscode-cdn.net/tmp/ipykernel_753165/545637326.py) in load_gpt2_test(cls, gpt2_layer, input_name, cache_dict)
34 output = layer(reference_input)
35 print("Output shape:", output.shape)
---> 36 reference_output = gpt2_layer(reference_input)
37 print("Reference output shape:", reference_output.shape)
38
[~/src/2022/TransformerLens/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py](https://file+.vscode-resource.vscode-cdn.net/home/erlebach/src/2022/TransformerLens/clean_transformer/~/src/2022/TransformerLens/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py) in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []
TypeError: forward() missing 2 required positional arguments: 'key_input' and 'value_input'
The google colab is using this older branch of the TransformerLens code: https://github.com/neelnanda-io/TransformerLens/tree/clean-transformer-demo (way back when the library was called EasyTransformer).
You should either use that branch locally, or not use the Clean_Transformer_Demo.ipynb. I think we should rename this issue "Add Clean_Transformer_Demo.ipynb to the current repo". How does that sounds?
Thanks. Perhaps the do cumbersome should mention the branch that corresponds to the Clean_Transfirmer_Demo.ipynb? How could I have figured this out?
@erlebach the relevant line from the colab is
%pip install git+https://github.com/neelnanda-io/Easy-Transformer.git@clean-transformer-demo
this doesn't install the latest TransformerLens, but a specific branch.
I've made this an issue for the current milestone and changed the title to reflect the change I'd like to see.
@ArthurConmy : Thank you. My bad. I should have paid more attention.
I installed the branch in a separate repository, and ran the EasyTransformer_Demo
in the environment of TransformerLens. The code ran perfectly on the Ubuntu system. There was no error in the attention model. So I do not understand how that is possible, since the gpt2 model retrieved presumably came from within the virtual environment of TransformerLens. The Attention class was identical in the two transformer codes I ran. Conceptually, how can I get an error in one code and not the other assuming they are both using the same gpt2 model? Thanks for any advice.
Arthur added a PR that changes the internal implementation of the model, so attention layers have separate inputs for query, key and value. The testing code uses a particular layer of the model, and isn't adapted to use the new interface. It wouldn't be that hard to fix in the code
Possibly we should set it so that if ONLY query input is given, key and value are also set to that?
On Sun, 21 May 2023, 5:33 pm Gordon Erlebacher, @.***> wrote:
I installed the branch in a separate repository, and ran the EasyTransformer_Demo in the environment of TransformerLens. The code ran perfectly on the Ubuntu system. There was no error in the attention model. So I do not understand how that is possible, since the gpt2 model retrieved presumably came from within the virtual environment of TransformerLens. The Attention class was identical in the two transformer codes I ran. Conceptually, how can I get an error in one code and not the other assuming they are both using the same gpt2 model? Thanks for any advice.
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/283#issuecomment-1556220333, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKMHKO7FC4ULBOGY2V3XHI7WLANCNFSM6AAAAAAYJJ7IHI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks. Yes, I saw the differences in the components/ folder. Perhaps include an updated version of the Easy_Transfirmer_Demo in the repository?
@erlebach Sorry about this. We have recently reviewed them all in an open PR and I'm trying to find people who can validate they are currently working as expected. Please continue to raise any issues if you have them and we'll make sure you're unblocked.
@erlebach can you please let me if you are blocked by this, otherwise I might close the issue in a couple of days.
Note that Callum McDougall has an updated version of the tutorial here: https://transformerlens-intro.streamlit.app/Transformer_from_scratch
On Mon, 22 May 2023 at 09:21, Joseph Bloom @.***> wrote:
@erlebach https://github.com/erlebach can you please let me if you are blocked by this, otherwise I might close the issue in a couple of days.
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/283#issuecomment-1556761923, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKP4AES4BZBBVNHVRHDXHMOYRANCNFSM6AAAAAAYJJ7IHI . You are receiving this because you commented.Message ID: @.***>
Hi, with your help, I am unblocked. I basically have to update the testing for the Attention component since the forward
method now takes three identical arguments (for self-learning).
The link on Callum McDougall's page is what I was running with the main branch of TransformerLens and that will not work since his version of attention has a forward
method with a single argument and TransformerLens
has three aarguments. Therefore the test as written, using GPT2, fails. Of course it works if I used the correct branch. I just feel that you should include a link to a version of the clean Transformer that works properly with the current version of the components.py
module in TransformerLens
.
You can close the issue. Thanks for the help!