LMFlow
LMFlow copied to clipboard
[BUG]Model size change
I was wondering why the gpt2-large model I downloaded from huggingface was 3.1G, but after run_fineturn_with_lora_save_aggregated_weightss.sh it was only 1.5G. This may be a defect in my professional knowledge. Sorry to disturb you. I hope you can get an answer. Here's how I ran it:
./scripts/run_finetune_with_lora_save_aggregated_weights.sh
--model_name_or_path gpt2-large
--dataset_path /data1/LMFlow/data/integrated-data1/dataset6
--output_model_path output_models/gpt2-large-inte6
Thanks for your interest in LMFlow! I am wondering if @hendrydong could help look into this problem? It can be caused by transformers
version upgrades. To the best of our knowledge, huggingface is using different model card format after transformers >= 4.30.x
, so the merging script may not be functioning well for the latest version of transformers.
Could you please provide your transformers
versions so we could help you locate the problem? Thanks 😄
Thanks for your interest in LMFlow! I am wondering if @hendrydong could help look into this problem? It can be caused by
transformers
version upgrades. To the best of our knowledge, huggingface is using different model card format aftertransformers >= 4.30.x
, so the merging script may not be functioning well for the latest version of transformers.Could you please provide your
transformers
versions so we could help you locate the problem? Thanks 😄
thanks for your reply,the version of my transformers library is 4.32.1
Given your model size, I think the precision may play a role. FP32 vs FP16/BF16?
Hi mkkk1112, I got a similar error. I train with lora and merge the adapter layer with my base model (fp16). And it turn out the merged model is stored in fp32.
It's solved by adding
--torch_dtype float16
while executing examples/merge_lora.py
thanks a lot!!!
---Original--- From: "Li @.> Date: Fri, Jan 26, 2024 22:24 PM To: @.>; Cc: @.@.>; Subject: Re: [OptimalScale/LMFlow] [BUG]Model size change (Issue #662)
Hi mkkk1112, I got a similar error. I train with lora and merge the adapter layer with my base model (fp16). And it turn out the merged model is stored in fp32.
It's solved by adding --torch_dtype float16
while executing examples/merge_lora.py
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>