litgpt Question about hf to model.lit conversion

Hello everyone, The script convert_hf_checkpoint.py converts the weights of a model from huggingface to the lit format. I am not really understanding how is this conversion working and how it makes the usage of the model more efficient.

Do you know if there are some ressources/documentation I could read to understand it more ?

How hard would it be to adapt this script to convert another model from huggingface to the lit format ?

Thank you !

Jul 05 '23 11:07 codeur-rapide

Hi @codeur-rapide. This conversion step doesn't make the model more or less efficient, it's just a mapping from the original HF model state_dict() keys to our model state_dict() keys. For example, this is the mapping for the gpt-neox based models: https://github.com/Lightning-AI/lit-gpt/blob/main/scripts/convert_hf_checkpoint.py#L25-L45

Adding a mapping for another model can be very easy or very difficult depending on the model you want to port and how different it is architecturally to what we already support. Which model would you like to support?

Jul 05 '23 12:07 carmocca

Thank you for your answer ! For example, I'd like to support the Nous-Hermes-13B model : https://huggingface.co/NousResearch/Nous-Hermes-13b

Jul 05 '23 14:07 ghost

Then you are in luck because since it's LLaMA based, everything should be supported already. You just need to add a config just like https://github.com/Lightning-AI/lit-gpt/blob/72958cbd4b72f79e7403d63c363e3b3da3c72b29/lit_gpt/config.py#L282-L298 but for that model specifically

Jul 05 '23 15:07 carmocca

Hi, I'm working with FastChat-3B. Any luck with that being supported with existing mappings?

Jul 09 '23 22:07 ht0rohit

@ht0rohit FastChat uses the t5 model architecture which we don't plan to support

Jul 10 '23 11:07 carmocca

@codeur-rapide Were you successful? If so, would you like to contribute adding support for it with a PR?

Jul 10 '23 22:07 carmocca

Yes it worked ! I will create a pull request with the changes i made to support this model

Jul 11 '23 12:07 ghost