litgpt
litgpt copied to clipboard
Add support for MPT
It would be interesting to add support for MPT models. They are maybe the only one with ALiBi encoding, and the new MPT-30B model supports 8k context length.
Thanks!
+1
I looked into implementing this (branch). The missing pieces are:
- ALiBi
- Low precision LayerNorm
And to reproduce training, they also do
- Tied embeddings weights with lm_head
- kaiming normal initialization