[Proposal] Guide to adding new models
Proposal
To have a guide/support document for adding support for a new model in the library
Motivation
Lower the barrier for getting into mechanistic interpretability. Though the current list of supported models is large, it does not support a lot of the newer models, which prevents messing around with a lot of such models.
Pitch
I could only find a small piece on the documentation site which briefly described adding new models in the Roadmap over here → here
I am quite new to using Hooks, I may be able to add support for some models that I and my lab are currently planning to use, but most of it will require some handholding.
Alternatives
I am open to any new suggestions. Would this be something that you be interested in? @neelnanda-io
Checklist
- [X] I have checked that there is no similar issue in the repo (required)
I think this would be cool to exist! I unfortunately don't have capacity to make this myself. Looking at past PRs to eg add LLaMA or Gemma should give some idea
On Thu, 26 Sept 2024 at 03:19, Deven Mistry @.***> wrote:
Proposal
To have a guide/support document for adding support for a new model in the library Motivation
Lower the barrier for getting into mechanistic interpretability. Though the current list of supported models is large, it does not support a lot of the newer models, which prevents messing around with a lot of such models. Pitch
I could only find a small piece on the documentation site which briefly described adding new models in the Roadmap over here → here https://transformerlensorg.github.io/TransformerLens/content/news/release-2.0.html#streamlining-adding-new-models
image.png (view on web) https://github.com/user-attachments/assets/09a7a6d0-b39a-4472-9808-9d9fc431c692
I am quite new to using Hooks, I may be able to add support for some models that I and my lab are currently planning to use, but most of it will require some handholding. Alternatives
I am open to any new suggestions. Would this be something that you be interested in? @neelnanda-io https://github.com/neelnanda-io Checklist
- I have checked that there is no similar issue https://github.com/TransformerLensOrg/Transformerlens/issues in the repo (required)
— Reply to this email directly, view it on GitHub https://github.com/TransformerLensOrg/TransformerLens/issues/729, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKKYONB37TXIYIFCPKLZYNVM5AVCNFSM6AAAAABO36J55GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2DSMZTGIYDAMQ . You are receiving this because you were mentioned.Message ID: @.***>
@deven367 It's in the pipeline at the moment. I am in the middle of going through a few key pieces of model compatibility, and I have been hesitant on putting together a guide on how to do it today when a couple key steps are going to either be added or change relatively quickly. If you want to meet to discuss how to do it in the time being, I am happy to do so. LLaMA 3.1 would be on the simpler side to add right now, and I can go through the more complicated process after that.
@bryce13950 I feel that would be nice, I am open to discussing this.
Someone opened a PR last night for LLaMA 3.1, but I am sure we can find one for you to add. Is there a specific model you are interested in? Are you on the Slack channel?
Hey @bryce13950, our lab is specifically interested in models having learnable positional encoding. So, if I possible, I would first like to start with Bart and then try to see Mamba (as its a model that doesn't use attention)
Also, I am not on the Slack channel
The slack link in the readme is broken.
Here's a new Slack link, sorry! They break after 400 people use them... https://join.slack.com/t/opensourcemechanistic/shared_invite/zt-2n26nfoh1-TzMHrzyW6HiOsmCESxXtyw
I made a PR to update it, but will leave it up to @bryce13950 whether he's OK merging it directly into main https://github.com/TransformerLensOrg/TransformerLens/pull/742
Re models to use, Bart should be doable, though may be a pain, as it's an encoder-decoder model, and most models used here are decoder-only. But the HookedEncoderDecoder.py file should be a good place to start, as we support T5, and you can hopefully adapt that?
Mamba will be a whole different beast, as it's recurrent, so eg if you give it a 1000 token sequence and add a hook on a recurrent layer, I think it'll be run 1000 times in a single forward pass? Fortunately, Danielle Ensign already implemented a Mamba port of TransformerLens: https://github.com/Phylliida/MambaLens
Thanks a lot @neelnanda-io, I've joined the Slack channel!
Re: This is soo cool, I wasn't aware that something like MambaLens existed I will look into it. Regarding Bart I think I can create try to create a PR for that. Alongside, I will see if I can generalize the instructions which could potentially become a guide. How does this sound @bryce13950?
@deven367 That sounds perfect to me! Ping me on Slack, and we can discuss further if you like. The HookedEncoderDecord is a relatively newer addition to TransformerLens, and the only models that are supported are t5. That means that there may be a bit more that needs be done to add a second architecture type, but I don't think it is going to be too difficult.