Easy-Transformer [Proposal] Add support for Mamba

[Proposal] Add support for Mamba

Open joker3212 opened this issue 1 year ago • 5 comments

Proposal

Mamba shows "best-in-class on every single evaluation result, and generally matches baselines at twice the model size." It won't be long before we see more language models in the wild with the Mamba architecture.

Paper: https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf Code: https://github.com/state-spaces/mamba

Dec 09 '23 15:12 joker3212

If there is support for the proposal, I would like to work on the implementation.

Dec 09 '23 15:12 joker3212

I'm excited for people to work on adding new architectures to TransformerLens! :)

However, your figure is not the most important figure in that paper. None of the models use the "Transformer++" Swiglu+Parallel Attention+GroupedQuery+overtraining that Llama and Mistral use -- when comparing to Transformer++, Mamba is not a clear winner. But it may be better!

Dec 09 '23 17:12 ArthurConmy

Ahh good catch. Thanks for pointing that out. As the adoption picks up, I'd be interested to see the evaluation metrics compared to Transformer++ based architectures.

In the meantime I'll get started on adding Mamba and should have a PR out soon.

Dec 09 '23 17:12 joker3212

I could also help, would love to do some cool mech interp things on state space models!

Dec 14 '23 03:12 SeuperHakkerJa

I could also help, would love to do some cool mech interp things on state space models!

That would be awesome! I started some work here. Feel free to take a look and let me know what you think.

Dec 14 '23 03:12 joker3212

Easy-Transformer Easy-Transformer copied to clipboard

[Proposal] Add support for Mamba

Proposal

Easy-Transformer
Easy-Transformer copied to clipboard