unsloth
unsloth copied to clipboard
Lora merging on cpu?
If you train very big model with 4bit it will be able to fit
But during Lora merging with
model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
It need to be dequqntized so immidiately gives OOM
Can merging be done on cpu or with per layer loading as in transformers?
Sorry on the delay! Oh it should dispatch to CPU on the fly!
Try doing model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit", maximum_memory_usage = 0.7)