CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

LoRA finetuning checkpoint size

Open TianxingWu opened this issue 1 year ago • 3 comments

System Info / 系統信息

CogVideoX-2B SAT LoRA finetuning

Information / 问题信息

  • [ ] The official example scripts / 官方的示例脚本
  • [X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Finetune the 2B model with official LoRA setting

Expected behavior / 期待表现

I suppose the size of the saved checkpoint mp_rank_00_model_states.pt should be similar to the original transformer checkpoint. However, it increases significantly from 3.5G to 26G. I checked the checkpoint and found that the number of items in modules increases from 557 to 1453, containing not only the LoRA-related weights, but also weights of conditioner(T5) and first_stage_model (VAE), which in my opinion should not be contained in the checkpoint, as T5 and VAE are not to be changed in the finetuning process.

TianxingWu avatar Sep 06 '24 10:09 TianxingWu

this exports the entire model because torch.save was used when saving.

zRzRzRzRzRzRzR avatar Sep 06 '24 17:09 zRzRzRzRzRzRzR

Thanks for the reply! Just thought it would be nice to separate LoRA weights from others.

TianxingWu avatar Sep 12 '24 21:09 TianxingWu

check here https://github.com/THUDM/CogVideo/tree/main/sat#using-the-fine-tuned-model

zRzRzRzRzRzRzR avatar Sep 13 '24 03:09 zRzRzRzRzRzRzR