litgpt Support a new model

Do you have a plan to support JetMoE model (https://github.com/myshell-ai/JetMoE) that very effective to reduce computational cost in inference in litgpt?

Jun 10 '24 06:06 takgto

Hi there, thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

Jun 10 '24 15:06 rasbt

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Jun 13 '24 01:06 rasbt

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me. Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows. weight_map = { "model.embed_tokens.weight": "transformer.wte.weight", "model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key "model.layers.{}.mlp.router.layer.weight": ?, "model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight", "model.layers.{}.mlp.bias": ?, "model.layers.{}.mlp.input_linear.weight": ?, "model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight", "model.layers.{}.self_attention.experts.bias": ? , "model.layers.{}.self_attention.experts.input_linear.weight": ? , "model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight", "model.layers.{}.self_attention.kv_proj.weight": ? , "model.norm.weight": "transformer.ln_f.weight", "model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight", "model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight", "model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight", } Do you know any tools or documentations to find out those unknown keys?

Jun 13 '24 01:06 takgto

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

https://github.com/Lightning-AI/litgpt/blob/e2f8074b32ce08852f933636d1d81689990e1771/litgpt/scripts/convert_hf_checkpoint.py#L138

Jun 13 '24 19:06 rasbt

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

Jun 13 '24 19:06 rasbt

Thank you for your continued support. According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe. Separately, I am asking the jetmoe website to provide parameter mapping information ( https://github.com/myshell-ai/JetMoE/issues/11 ). Unfortunately, I haven't received a reply yet.

Jun 14 '24 00:06 takgto

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this

Jun 14 '24 17:06 rasbt