Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`
I fine-tuned llama3-8b with Lora and followed the tutorial in the repository to convert the final result into model.pth. However, when I try to load the fine-tuned weights into the model using AutoModelForCausalLM.from_pretrained, I am unable to do so correctly. Below is my test:
state_dict = torch.load('out/convert/hf-llama3-instruct-esconv/model.pth')
print("state_dict: ", state_dict)
model = AutoModelForCausalLM.from_pretrained('checkpoints/meta-llama/Meta-Llama-3-8B',
device_map=device_map, torch_dtype=torch.float16,
state_dict=state_dict)
print("model.weights", model.state_dict())
But I found that the state_dict of torch.load doesn't equal to the model.state_dict(), as shown following:
torch.load:
model.state_dict()
I noticed that even though I passed the state_dict, from_pretrained still returns the weights of the model loaded by name. Did I make any mistakes in my code, and how can I solve this? Thanks!
I can load the weight using the model.load_state_dict(), and then everything will go smoothly, but I really want to know why from_pretrained(state_dict=state_dict) can't work.
Thanks for raising that. Maybe it's a HF thing. I will have to investigate.
I could not reproduce it for another model yet when I gave it a quick try.
I am not sure if it's related because the differences are so big, but I wonder ~what the precision of the tensors in your current state dict are. Could you print the precision of the state dict, and~ could you also try to load it without torch_dtype=torch.float16?
EDIT: Nevermind, I can see that the precision is bfloat16 in your screenshot.
I tried this also with Llama 3 and it seemed to work fine for me there as well. Here are my steps:
litgpt download --repo_id meta-llama/Meta-Llama-3-8B-Instruct --access_token ...
litgpt finetune \
--checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct \
--out_dir my_llama_model \
--train.max_steps 1 \
--eval.max_iter 1
litgpt convert from_litgpt \
--checkpoint_dir my_llama_model/final \
--output_dir out/converted_llama_model/
And then in a python session:
and
I tried this also with Llama 3 and it seemed to work fine for me there as well. Here are my steps:
litgpt download --repo_id meta-llama/Meta-Llama-3-8B-Instruct --access_token ... litgpt finetune \ --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct \ --out_dir my_llama_model \ --train.max_steps 1 \ --eval.max_iter 1 litgpt convert from_litgpt \ --checkpoint_dir my_llama_model/final \ --output_dir out/converted_llama_model/And then in a python session:
and
![]()
Sorry for my late reply. I'm more than delighted that you can answer my former issue. But maybe there are some specific reason why you can't reproduce this issue. In your test, you compared the weight of mlp.down_proj.weight and lm_head.weight between the finetuned model on Llama3-instruct and the initial Llama3-instruct model, but I'm suspicious that maybe they should be the same because maybe when we merge the finetuning weight to a checkpoint, only the content related to attention will change a little. I mean maybe the finetuning procedure only have impact on the weight related to self_attn.q_proj/k_proj/v_proj/o_proj.(I can not make sure but in my memory it does). In my test, I compared the mlp.down_proj.weight and lm_head.weight between the the finetuned model on Llama3-instruct and the Llama3(not instruct).
Could you use Llama3-8b(not instruct version) and then figure out whether this issue will be reproduced? Thanks!
