sd_dreambooth_extension Lora training step time >2x increase after checkpoint

' Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] Commit hash: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490 Installing requirements for Web UI [auto-sd-paint-ext] Attempting auto-update... [auto-sd-paint-ext] Fetch upstream. [auto-sd-paint-ext] Pull upstream.

Installing requirements for scikit_learn

####################################################################################################### Initializing Dreambooth If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] Dreambooth revision: f2ba1dbdd0d8eaeba9502d69355d74c3044fe432 SD-WebUI revision: 685f9631b56ff8bd43bce24ff5ce0f9a0e9af490

Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [+] xformers version 0.0.15.dev0+c101579.d20221117 installed. [+] torch version 1.12.1+cu116 installed. [+] torchvision version 0.13.1+cu116 installed. ####################################################################################################### '

Have you read the Readme? yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? yes Have you updated Dreambooth to the latest revision? yes Have you updated the Stable-Diffusion-WebUI to the latest version? yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. OK Describe the bug

Lora training begins taking twice as long per step after the first training checkpoint where the model is saved and sampled. From sample 500 there was ~8 s/it and then after saving it goes to ~17 s/it for some reason.

Provide logs

Compiling checkpoint for ... Applying lora model... Saving checkpoint to F:SD_MODELS<model>_250_lora.ckpt... Generating samples: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:21<00:00, 21.57s/it] [*] Weights saved at C:\Users<user>\stable-diffusion-webui\models\dreambooth<model>1/1 [00:21<00:00, 21.57s/it] Steps: 1%|▎ | 500/42400 [1:01:59<91:42:28, 7.88s/it, loss=0.846, lr=1e-6, vram=5.9/9.6GB] Saving lora weights at step 500 Allocated 7.5/9.3GB Reserved: 8.5/9.8GB

Compiling checkpoint for ... Applying lora model... Saving checkpoint to F:SD_MODELS<model>_500_lora.ckpt... Generating samples: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:22<00:00, 22.39s/it] [*] Weights saved at C:\Users<user>\stable-diffusion-webui\models\dreambooth<model>1/1 [00:22<00:00, 22.39s/it] Steps: 1%|▍ | 552/42400 [1:17:41<198:33:31, 17.08s/it, loss=1.1, lr=1e-6, vram=5.9/10.2GB]

Environment

What OS? Windows If Windows - WSL or native? native Windows What GPU are you using? 1080 Ti (11gb) Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json

{ "adam_beta1": 0.9, "adam_beta2": 0.999, "adam_epsilon": 1e-08, "adam_weight_decay": 0.01, "attention": "xformers", "center_crop": false, "concepts_path": "", "epoch_pause_frequency": 0.0, "epoch_pause_time": 60.0, "gradient_accumulation_steps": 1, "gradient_checkpointing": true, "half_model": false, "hflip": false, "learning_rate": 1e-06, "lr_scheduler": "constant", "lr_warmup_steps": 500, "max_grad_norm": 1, "max_token_length": 75, "max_train_steps": 0, "mixed_precision": "fp16", "model_dir": "C:\Users\\stable-diffusion-webui\models\dreambooth\", "model_name": "", "not_cache_latents": false, "num_train_epochs": 100, "pad_tokens": true, "pretrained_model_name_or_path": "C:\Users\\stable-diffusion-webui\models\dreambooth\\working", "pretrained_vae_name_or_path": null, "prior_loss_weight": 1, "resolution": 768, "revision": 500, "sample_batch_size": 1, "save_class_txt": true, "save_embedding_every": 250, "save_preview_every": 250, "save_use_global_counts": true, "save_use_epochs": true, "scale_lr": true, "src": "F:SD_MODELS\v2-1_768-ema-pruned.ckpt", "shuffle_tags": false, "train_batch_size": 1, "train_text_encoder": true, "use_8bit_adam": true, "use_concepts": false, "use_cpu": false, "use_ema": false, "use_lora": true, "scheduler": "ddim", "v2": true, "has_ema": "False", "concepts_list": [ { "max_steps": -1, "instance_data_dir": "<instance_path", "class_data_dir": "<class_path>", "instance_prompt": "<instance_prompt>", "class_prompt": "<class_prompt>", "save_sample_prompt": "", "save_sample_template": "", "instance_token": "", "class_token": "", "num_class_images": 2034, "class_negative_prompt": "", "class_guidance_scale": 9, "class_infer_steps": 30, "save_sample_negative_prompt": "", "n_save_sample": 1, "sample_seed": -1, "save_guidance_scale": 9, "save_infer_steps": 30 } ], "lifetime_revision": 500 }

Dec 12 '22 01:12 hdon96

Somewhere after 750 the speed went back up. Not sure why, no different activity on the computer.

Dec 12 '22 03:12 hdon96

I think it was a background program causing the slowdown

Dec 12 '22 04:12 hdon96

Not a background program, observed the same pattern of speedup and slowdown between checkpoints with no background processes running.

Dec 12 '22 17:12 hdon96

I also gave up on training session last night because I kept ending up at 7s or 3s per it when I usually train at ~1.6s/it.

Is this what you are referring to?

Dec 12 '22 17:12 james-things

I also gave up on training session last night because I kept ending up at 7s or 3s per it when I usually train at ~1.6s/it.

Is this what you are referring to?

Yes this exactly

Dec 12 '22 18:12 hdon96

same here from 1-2it/s to 1-2s/it after update webui and dreambooth

Dec 12 '22 21:12 AndyLonly

I may have made progress figuring this one out, are you both using "Apply Horizontal Flip" by chance? Try without if you are. Seems to have made the difference for me, I will look at it more after this training.

Edit: nvm, issue cropping up again after saving preview image

Dec 12 '22 22:12 james-things

The speed is now right after the update

Dec 13 '22 02:12 AndyLonly

resolved in new updates! thanks!

Dec 13 '22 03:12 hdon96

sd_dreambooth_extension sd_dreambooth_extension copied to clipboard

Lora training step time >2x increase after checkpoint

sd_dreambooth_extension
sd_dreambooth_extension copied to clipboard