Lora merging on cpu?

Open Theodotus1243 opened this issue 1 year ago • 1 comments

If you train very big model with 4bit it will be able to fit

But during Lora merging with model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",) It need to be dequqntized so immidiately gives OOM

Can merging be done on cpu or with per layer loading as in transformers?

May 06 '24 21:05 Theodotus1243

Sorry on the delay! Oh it should dispatch to CPU on the fly!

Try doing model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit", maximum_memory_usage = 0.7)

May 08 '24 19:05 danielhanchen