starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Model size doubles after .merge_and_unload() and .save_pretrained()

Open anudeep-peela opened this issue 1 year ago • 4 comments

My System Info

peft==0.4.0 accelerate==0.18.0 transformers==4.28.0 py310

Reproduction

After training, I merge the peft weights with base model using:

model_ft = PeftModel.from_pretrained(
    AutoModelForCausalLM.from_pretrained(
        base_model_path,
        return_dict=True,
        torch_dtype='auto',
        use_cache=True,
    ),
    peft_path,
    torch_dtype=torch.float16
).merge_and_unload()

Then for inference as standalone model, I save to disk using

model.save_pretrained(destination_path)
tokenizer.save_pretrained(destination_path)

And later load it back again whenever needed using

inference_model = AutoModelForCausalLM.from_pretrained(
model_path,
return_dict=True,
torch_dtype=torch.float16,
use_cache=True,
device_map="auto"
)

Expected behavior

I am training Star Coder 7B, which initially has a size of around 15GB. I began the training with specific LoRa Rank and alpha parameters. To experiment with different combinations of these parameters, I stopped the training process few times, performed a "merge_and_unload" operation. Afterward, I restart the training with a new combination of LoRa and alpha values on top of latest stored model. This approach worked well up to approximately 500-600 steps. However, after that point, I noticed an issue: when I saved my model after merging, its disk size unexpectedly ballooned to 30GB, even though my "adapter_bin" file is only around 400MB. Not sure why the model size increased?

anudeep-peela avatar Sep 05 '23 19:09 anudeep-peela

I am having the same issue with Falcon 1b. The original model is about 2.3g on disk while the adapter is about 40m. After merging, the model is saved with 4.5g in disk. I checked if the number of parameters are keeping constant and they are. Also using safetensors did not reduce the model size after merging.

I am using HuggingFace 4.30 PEFT 0.5.0

SankhaSubhra avatar Sep 15 '23 08:09 SankhaSubhra

Same Issue with LLama 2 models both 7b and 13b

kiamesdavies avatar Oct 22 '23 12:10 kiamesdavies

Try with dtype=torch.bfloat16 (i.e. during model load for merging, assuming the original was already in half precision so is the lora), that solved the issue for me. I believe the model in default loads in torch.float32, that explains the doubling in size.

SankhaSubhra avatar Oct 22 '23 13:10 SankhaSubhra

Thanks @SankhaSubhra, also found it in a merge script for the same purpose https://github.com/georgian-io/LLM-Finetuning-Hub/blob/7c0413ebedba7ee96d0c17c02f2158c7d3c4c142/inference/text_generation/merge_script.py#L42C29-L42C29

kiamesdavies avatar Oct 22 '23 21:10 kiamesdavies