LMFlow icon indicating copy to clipboard operation
LMFlow copied to clipboard

[BUG]Model size change

Open Mikivishy opened this issue 1 year ago • 5 comments

I was wondering why the gpt2-large model I downloaded from huggingface was 3.1G, but after run_fineturn_with_lora_save_aggregated_weightss.sh it was only 1.5G. This may be a defect in my professional knowledge. Sorry to disturb you. I hope you can get an answer. Here's how I ran it:

./scripts/run_finetune_with_lora_save_aggregated_weights.sh
--model_name_or_path gpt2-large
--dataset_path /data1/LMFlow/data/integrated-data1/dataset6
--output_model_path output_models/gpt2-large-inte6

Mikivishy avatar Oct 20 '23 02:10 Mikivishy

Thanks for your interest in LMFlow! I am wondering if @hendrydong could help look into this problem? It can be caused by transformers version upgrades. To the best of our knowledge, huggingface is using different model card format after transformers >= 4.30.x, so the merging script may not be functioning well for the latest version of transformers.

Could you please provide your transformers versions so we could help you locate the problem? Thanks 😄

research4pan avatar Oct 29 '23 08:10 research4pan

Thanks for your interest in LMFlow! I am wondering if @hendrydong could help look into this problem? It can be caused by transformers version upgrades. To the best of our knowledge, huggingface is using different model card format after transformers >= 4.30.x, so the merging script may not be functioning well for the latest version of transformers.

Could you please provide your transformers versions so we could help you locate the problem? Thanks 😄

thanks for your reply,the version of my transformers library is 4.32.1

Mikivishy avatar Oct 29 '23 12:10 Mikivishy

Given your model size, I think the precision may play a role. FP32 vs FP16/BF16?

hendrydong avatar Oct 30 '23 02:10 hendrydong

Hi mkkk1112, I got a similar error. I train with lora and merge the adapter layer with my base model (fp16). And it turn out the merged model is stored in fp32.

It's solved by adding

    --torch_dtype float16

while executing examples/merge_lora.py

gzliyu avatar Jan 26 '24 14:01 gzliyu

thanks a lot!!!

---Original--- From: "Li @.> Date: Fri, Jan 26, 2024 22:24 PM To: @.>; Cc: @.@.>; Subject: Re: [OptimalScale/LMFlow] [BUG]Model size change (Issue #662)

Hi mkkk1112, I got a similar error. I train with lora and merge the adapter layer with my base model (fp16). And it turn out the merged model is stored in fp32.

It's solved by adding --torch_dtype float16

while executing examples/merge_lora.py

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Mikivishy avatar Jan 26 '24 14:01 Mikivishy