Easy-Transformer [Proposal] Guide to adding new models

Proposal

To have a guide/support document for adding support for a new model in the library

Motivation

Lower the barrier for getting into mechanistic interpretability. Though the current list of supported models is large, it does not support a lot of the newer models, which prevents messing around with a lot of such models.

Pitch

I could only find a small piece on the documentation site which briefly described adding new models in the Roadmap over here → here

I am quite new to using Hooks, I may be able to add support for some models that I and my lab are currently planning to use, but most of it will require some handholding.

Alternatives

I am open to any new suggestions. Would this be something that you be interested in? @neelnanda-io

Checklist

[X] I have checked that there is no similar issue in the repo (required)

Sep 26 '24 02:09 deven367

I think this would be cool to exist! I unfortunately don't have capacity to make this myself. Looking at past PRs to eg add LLaMA or Gemma should give some idea

On Thu, 26 Sept 2024 at 03:19, Deven Mistry @.***> wrote:

Proposal

To have a guide/support document for adding support for a new model in the library Motivation

Lower the barrier for getting into mechanistic interpretability. Though the current list of supported models is large, it does not support a lot of the newer models, which prevents messing around with a lot of such models. Pitch

I could only find a small piece on the documentation site which briefly described adding new models in the Roadmap over here → here https://transformerlensorg.github.io/TransformerLens/content/news/release-2.0.html#streamlining-adding-new-models

image.png (view on web) https://github.com/user-attachments/assets/09a7a6d0-b39a-4472-9808-9d9fc431c692

I am quite new to using Hooks, I may be able to add support for some models that I and my lab are currently planning to use, but most of it will require some handholding. Alternatives

I am open to any new suggestions. Would this be something that you be interested in? @neelnanda-io https://github.com/neelnanda-io Checklist

I have checked that there is no similar issue https://github.com/TransformerLensOrg/Transformerlens/issues in the repo (required)

— Reply to this email directly, view it on GitHub https://github.com/TransformerLensOrg/TransformerLens/issues/729, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKKYONB37TXIYIFCPKLZYNVM5AVCNFSM6AAAAABO36J55GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2DSMZTGIYDAMQ . You are receiving this because you were mentioned.Message ID: @.***>

Sep 26 '24 15:09 neelnanda-io

@deven367 It's in the pipeline at the moment. I am in the middle of going through a few key pieces of model compatibility, and I have been hesitant on putting together a guide on how to do it today when a couple key steps are going to either be added or change relatively quickly. If you want to meet to discuss how to do it in the time being, I am happy to do so. LLaMA 3.1 would be on the simpler side to add right now, and I can go through the more complicated process after that.

Sep 29 '24 18:09 bryce13950

@bryce13950 I feel that would be nice, I am open to discussing this.

Oct 01 '24 15:10 deven367

Someone opened a PR last night for LLaMA 3.1, but I am sure we can find one for you to add. Is there a specific model you are interested in? Are you on the Slack channel?

Oct 02 '24 16:10 bryce13950

Hey @bryce13950, our lab is specifically interested in models having learnable positional encoding. So, if I possible, I would first like to start with Bart and then try to see Mamba (as its a model that doesn't use attention)

Also, I am not on the Slack channel

Oct 03 '24 14:10 deven367

The slack link in the readme is broken.

Oct 03 '24 17:10 deven367

Here's a new Slack link, sorry! They break after 400 people use them... https://join.slack.com/t/opensourcemechanistic/shared_invite/zt-2n26nfoh1-TzMHrzyW6HiOsmCESxXtyw

I made a PR to update it, but will leave it up to @bryce13950 whether he's OK merging it directly into main https://github.com/TransformerLensOrg/TransformerLens/pull/742

Re models to use, Bart should be doable, though may be a pain, as it's an encoder-decoder model, and most models used here are decoder-only. But the HookedEncoderDecoder.py file should be a good place to start, as we support T5, and you can hopefully adapt that?

Mamba will be a whole different beast, as it's recurrent, so eg if you give it a 1000 token sequence and add a hook on a recurrent layer, I think it'll be run 1000 times in a single forward pass? Fortunately, Danielle Ensign already implemented a Mamba port of TransformerLens: https://github.com/Phylliida/MambaLens

Oct 03 '24 22:10 neelnanda-io

Thanks a lot @neelnanda-io, I've joined the Slack channel!

Re: This is soo cool, I wasn't aware that something like MambaLens existed I will look into it. Regarding Bart I think I can create try to create a PR for that. Alongside, I will see if I can generalize the instructions which could potentially become a guide. How does this sound @bryce13950?

Oct 04 '24 16:10 deven367

@deven367 That sounds perfect to me! Ping me on Slack, and we can discuss further if you like. The HookedEncoderDecord is a relatively newer addition to TransformerLens, and the only models that are supported are t5. That means that there may be a bit more that needs be done to add a second architecture type, but I don't think it is going to be too difficult.

Oct 05 '24 15:10 bryce13950