lora Is it possible to resume training of a lora checkpoint?

Is it possible to resume training of a lora checkpoint?

Open qunash opened this issue 2 years ago • 6 comments

I tried saving a pipeline patched with lora .pt checkpoint an then feeding it to the training script, but that errors out.

Dec 11 '22 12:12 qunash

@qunash Did you check this - https://github.com/cloneofsimo/lora/pull/11 ? This talks about resuming training, not sure if it helps though

Dec 11 '22 15:12 amrrs

Hi @amrrs, thanks for the reply! Yes, I' ve seen it, but still can't figure out how to resume training.

Dec 12 '22 14:12 qunash

Sorry, its not possible now. I'll add this to a feature as well. If you want to get notified, "watch" this repo!

Dec 12 '22 14:12 cloneofsimo

Awesome, thanks

Dec 12 '22 15:12 qunash

note that it is implemented in the sd_dreambooth_extension

Dec 12 '22 19:12 Thomas-MMJ

These two commits make unet and text encoder lora resumable by providing the path to the respective .pt files in 2 new arguments to train_lora_dreambooth.py.

Lora.py is modified so the inject_trainable_lora function accepts a path and loads the weights as they are initialized, in the same order they were saved.

One flaw to this implementation is you must change the output directory or resuming training will start to overwrite old .pt files since the global step is not resumed.

https://github.com/cloneofsimo/lora/pull/48

Dec 15 '22 20:12 hdon96

lora lora copied to clipboard

Is it possible to resume training of a lora checkpoint?

lora
lora copied to clipboard