Easy-Transformer icon indicating copy to clipboard operation
Easy-Transformer copied to clipboard

[Proposal] Add support for Mamba

Open joker3212 opened this issue 1 year ago • 5 comments

Proposal

Mamba shows "best-in-class on every single evaluation result, and generally matches baselines at twice the model size." It won't be long before we see more language models in the wild with the Mamba architecture.

Screenshot 2023-12-09 at 10 40 58

Paper: https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf Code: https://github.com/state-spaces/mamba

joker3212 avatar Dec 09 '23 15:12 joker3212

If there is support for the proposal, I would like to work on the implementation.

joker3212 avatar Dec 09 '23 15:12 joker3212

I'm excited for people to work on adding new architectures to TransformerLens! :)

However, your figure is not the most important figure in that paper. None of the models use the "Transformer++" Swiglu+Parallel Attention+GroupedQuery+overtraining that Llama and Mistral use -- when comparing to Transformer++, Mamba is not a clear winner. But it may be better!

image

ArthurConmy avatar Dec 09 '23 17:12 ArthurConmy

Ahh good catch. Thanks for pointing that out. As the adoption picks up, I'd be interested to see the evaluation metrics compared to Transformer++ based architectures.

In the meantime I'll get started on adding Mamba and should have a PR out soon.

joker3212 avatar Dec 09 '23 17:12 joker3212

I could also help, would love to do some cool mech interp things on state space models!

SeuperHakkerJa avatar Dec 14 '23 03:12 SeuperHakkerJa

I could also help, would love to do some cool mech interp things on state space models!

That would be awesome! I started some work here. Feel free to take a look and let me know what you think.

joker3212 avatar Dec 14 '23 03:12 joker3212