Setup for fine tuned Mistral model ?
Question
I have a fine tuned Mistral 7b model, can someone help me with the setup for this model ? I wanted to use HookedTransformer. If I do from_pretrained then I will get an error as it is not the official name. But since it is fine tuned the architecture will remain the same, so hook should be created without me manually doing it (It is what I am assuming).
Please someone help me with this
Hi, I think you should use .from_pretrained with the official mistral name and give hf_model=hf_model_of_your_finetuned_model
I think it wont work if the model is not there on hugging face hub as it is taking cfg from hugging face
yeah but your cfg is the same than the official mistral one, right ? I think this should work, do you mind trying and sharing the error if there is one ?
hf_model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="cpu",
) # Or whatever code that allow you to load your finetuned mistral in HF
nn_model = HookedTransformer.from_pretrained(
"mistralai/Mistral-7B-v0.1",
device="cpu",
hf_model=hf_model,
)
This is the error I am getting, but I feel this can work. Is there any solution for this ?
RuntimeError: Error(s) in loading state_dict for HookedTransformer: size mismatch for embed.W_E: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for unembed.W_U: copying a param with shape torch.Size([4096, 32002]) from checkpoint, the shape in current model is torch.Size([4096, 32000])
The problem is that Mistral's config enforces cfg.d_vocab = 32000:
https://github.com/TransformerLensOrg/TransformerLens/blob/5a374ec4b33cec6281b37494175d14f06c75dcfd/transformer_lens/loading_from_pretrained.py#L938
A quick hack to fix your problem is to change this line to 32002.
For a long-term solution, we might want to make my last pull request (#597) more general. For example, we could store a list of all the config arguments and check if they are in the kwargs. That would allow Siddhan to just pass d_vocab as an argument. @bryce13950, what do you think?
Maybe another solution would be to let people pass hf_config as an argument 🤔
But then we'd have to make the elif architecture == "MistralForCausalLM": case use hf_config, as right now it's hard coded
@Butanium I think you are touching on what I think is a larger project of reworking the way config is passed around, and how it is composed. It is something on my radar as a serious thing that needs to be addressed in the code base while maintaining compatibility. I would be very happy to discuss it, and some ideas I have on how to revamp it to make it easier for people to use, and easier to maintain. It's a pretty big change, and it's something I would like to address sooner rather than later given how big of a change it actually is.
There are two camps on this, one to make it really general, and one to keep it very specific in order to ensure that we know the models we are supporting are being supported correctly. I think I have a solution to it that appeases both camps, but I am not eager to move the code base in one direction or the other at the moment until a large discussion happens on the engineering, and how this should be structured to keep the fine tuned nature of TransformerLens in tact.
Maybe another solution would be to let people pass
hf_configas an argument 🤔But then we'd have to make the
elif architecture == "MistralForCausalLM":case usehf_config, as right now it's hard coded
This actually works pretty well. I have changed a few things in the source code and now I am able to use my fine tuned model which is stored locally. Not Mistral as it is still having the d_vocab issue, but another small model I have fine-tuned just to check if this works or not and the solution helps