lora icon indicating copy to clipboard operation
lora copied to clipboard

Is it possible to resume training of a lora checkpoint?

Open qunash opened this issue 2 years ago • 6 comments

I tried saving a pipeline patched with lora .pt checkpoint an then feeding it to the training script, but that errors out.

qunash avatar Dec 11 '22 12:12 qunash

@qunash Did you check this - https://github.com/cloneofsimo/lora/pull/11 ? This talks about resuming training, not sure if it helps though

amrrs avatar Dec 11 '22 15:12 amrrs

Hi @amrrs, thanks for the reply! Yes, I' ve seen it, but still can't figure out how to resume training.

qunash avatar Dec 12 '22 14:12 qunash

Sorry, its not possible now. I'll add this to a feature as well. If you want to get notified, "watch" this repo!

cloneofsimo avatar Dec 12 '22 14:12 cloneofsimo

Awesome, thanks

qunash avatar Dec 12 '22 15:12 qunash

note that it is implemented in the sd_dreambooth_extension

Thomas-MMJ avatar Dec 12 '22 19:12 Thomas-MMJ

These two commits make unet and text encoder lora resumable by providing the path to the respective .pt files in 2 new arguments to train_lora_dreambooth.py.

Lora.py is modified so the inject_trainable_lora function accepts a path and loads the weights as they are initialized, in the same order they were saved.

One flaw to this implementation is you must change the output directory or resuming training will start to overwrite old .pt files since the global step is not resumed.

https://github.com/cloneofsimo/lora/pull/48

hdon96 avatar Dec 15 '22 20:12 hdon96