Question about hf to model.lit conversion
Hello everyone, The script convert_hf_checkpoint.py converts the weights of a model from huggingface to the lit format. I am not really understanding how is this conversion working and how it makes the usage of the model more efficient.
Do you know if there are some ressources/documentation I could read to understand it more ?
How hard would it be to adapt this script to convert another model from huggingface to the lit format ?
Thank you !
Hi @codeur-rapide. This conversion step doesn't make the model more or less efficient, it's just a mapping from the original HF model state_dict() keys to our model state_dict() keys. For example, this is the mapping for the gpt-neox based models: https://github.com/Lightning-AI/lit-gpt/blob/main/scripts/convert_hf_checkpoint.py#L25-L45
Adding a mapping for another model can be very easy or very difficult depending on the model you want to port and how different it is architecturally to what we already support. Which model would you like to support?
Thank you for your answer ! For example, I'd like to support the Nous-Hermes-13B model : https://huggingface.co/NousResearch/Nous-Hermes-13b
Then you are in luck because since it's LLaMA based, everything should be supported already. You just need to add a config just like https://github.com/Lightning-AI/lit-gpt/blob/72958cbd4b72f79e7403d63c363e3b3da3c72b29/lit_gpt/config.py#L282-L298 but for that model specifically
Hi, I'm working with FastChat-3B. Any luck with that being supported with existing mappings?
@ht0rohit FastChat uses the t5 model architecture which we don't plan to support
@codeur-rapide Were you successful? If so, would you like to contribute adding support for it with a PR?
Yes it worked ! I will create a pull request with the changes i made to support this model