sd_dreambooth_extension
sd_dreambooth_extension copied to clipboard
Lora training step time >2x increase after checkpoint
' Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] Commit hash: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490 Installing requirements for Web UI [auto-sd-paint-ext] Attempting auto-update... [auto-sd-paint-ext] Fetch upstream. [auto-sd-paint-ext] Pull upstream.
Installing requirements for scikit_learn
####################################################################################################### Initializing Dreambooth If submitting an issue on github, please provide the below text for debugging purposes:
Python revision: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] Dreambooth revision: f2ba1dbdd0d8eaeba9502d69355d74c3044fe432 SD-WebUI revision: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490
Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [+] xformers version 0.0.15.dev0+c101579.d20221117 installed. [+] torch version 1.12.1+cu116 installed. [+] torchvision version 0.13.1+cu116 installed. ####################################################################################################### '
Have you read the Readme? yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? yes Have you updated Dreambooth to the latest revision? yes Have you updated the Stable-Diffusion-WebUI to the latest version? yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. OK Describe the bug
Lora training begins taking twice as long per step after the first training checkpoint where the model is saved and sampled. From sample 500 there was ~8 s/it and then after saving it goes to ~17 s/it for some reason.
Provide logs
Compiling checkpoint for
Compiling checkpoint for
Environment
What OS? Windows If Windows - WSL or native? native Windows What GPU are you using? 1080 Ti (11gb) Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json
{
"adam_beta1": 0.9,
"adam_beta2": 0.999,
"adam_epsilon": 1e-08,
"adam_weight_decay": 0.01,
"attention": "xformers",
"center_crop": false,
"concepts_path": "",
"epoch_pause_frequency": 0.0,
"epoch_pause_time": 60.0,
"gradient_accumulation_steps": 1,
"gradient_checkpointing": true,
"half_model": false,
"hflip": false,
"learning_rate": 1e-06,
"lr_scheduler": "constant",
"lr_warmup_steps": 500,
"max_grad_norm": 1,
"max_token_length": 75,
"max_train_steps": 0,
"mixed_precision": "fp16",
"model_dir": "C:\Users\
Somewhere after 750 the speed went back up. Not sure why, no different activity on the computer.
I think it was a background program causing the slowdown
Not a background program, observed the same pattern of speedup and slowdown between checkpoints with no background processes running.
I also gave up on training session last night because I kept ending up at 7s or 3s per it when I usually train at ~1.6s/it.
Is this what you are referring to?
I also gave up on training session last night because I kept ending up at 7s or 3s per it when I usually train at ~1.6s/it.
Is this what you are referring to?
Yes this exactly
same here from 1-2it/s to 1-2s/it after update webui and dreambooth
I may have made progress figuring this one out, are you both using "Apply Horizontal Flip" by chance? Try without if you are. Seems to have made the difference for me, I will look at it more after this training.
Edit: nvm, issue cropping up again after saving preview image
The speed is now right after the update
resolved in new updates! thanks!