sd_dreambooth_extension icon indicating copy to clipboard operation
sd_dreambooth_extension copied to clipboard

CPU only option returning ValueError

Open Xelgardi opened this issue 2 years ago • 7 comments

` Python revision: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Dreambooth revision: 88cc50ace470580842b519115545b9a08ebe115e SD-WebUI revision:

Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers[torch] version 0.10.0.dev0 installed. [+] transformers version 4.21.0 installed. [!] xformers NOT installed. [ ] torch version 1.12.1+cu113 installed. [ ] torchvision version 0.13.1+cu113 installed. `

Have you read the Readme? Yes.

Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? Yes.

Have you updated Dreambooth to the latest revision? Yes.

Have you updated the Stable-Diffusion-WebUI to the latest version? Yes.

No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. Ok.

Describe the bug

Training with 6 512x512 images is apparently too much for a 8GB VRAM card, so I wanted to try using my CPU, since a total of 32GB RAM should be enough, even if slow. However if I check the "CPU only" box I only get a "Exception training model: argument of type 'ValueError' is not iterable" error, shortly after starting.

Provide logs

`Concept 0 class dir is D:\Programme\stable-diffusion-2\models\dreambooth\Staketenzaun\classifiers_0
Starting Dreambooth training...
 Cleanup completed.
 Allocated: 0.0GB
 Reserved: 0.0GB

 Allocated 0.0/2.4GB
 Reserved: 0.0/2.5GB

Initializing dreambooth training...
Replace CrossAttention.forward to use default
 Training completed, reloading SD Model.
 Allocated: 0.0GB
 Reserved: 0.0GB

Memory output: {}
 Restored system models.
 Allocated: 2.4GB
 Reserved: 2.5GB

Returning result: Exception training model: argument of type 'ValueError' is not iterable`

Environment

What OS? Win10

If Windows - WSL or native? native

What GPU are you using? RTX3070 8GB

Screenshots/Config Screenshot of db_config.json attached. 2022-12-07 10_25_38-D__Programme_stable-diffusion-2_models_dreambooth_Staketenzaun_db_config json -

Xelgardi avatar Dec 07 '22 09:12 Xelgardi

since a total of 32GB RAM should be enough

You need about 32GB free.

Just did a test on mine with 64GB, and the process memory peaked at just over 30GB. This is without classifier images or text encoder or ema training. Used a batch size of 4. On a V1 model with 512px images.

With a batch size of 1, the process peaks around 25GB.

In both cases it seems to get stuck when it tries to generate a sample image (killed process first time, took 5 minutes the 2nd time). But this is after the checkpoint is saved.

leppie avatar Dec 07 '22 10:12 leppie

You need about 32GB free.

That.. is a lot more than I thought. But while this might be an issue later on, I'm reasonably sure that is not the cause of my error here, since my RAM usage doesn't even budge between starting the training and getting the error message. I'm assuming the RAM would be filled until it can't before the software runs into issues, just as it is with VRAM.

Xelgardi avatar Dec 07 '22 10:12 Xelgardi

I am also having this error, but I'm training on GPU. Settings are unchanged since my last model which worked, and I'm using an RTX 3060. Memory usage does not increase at all before the error is reported. Edit: Nevermind, looks like restarting the UI fixed it. Weird, because I don't think I've updated anything.

Slug-Cat avatar Dec 08 '22 05:12 Slug-Cat

for me replacing e with str(e) in line 409 of sd_dreambooth_extension/dreambooth/train_dreambooth.py seems to fix it.

McLP2 avatar Dec 10 '22 02:12 McLP2

for me replacing e with str(e) in line 409 of sd_dreambooth_extension/dreambooth/train_dreambooth.py seems to fix it.

Can you pull the latest version and lmk where to apply this, if it's still an issue?

d8ahazard avatar Dec 11 '22 21:12 d8ahazard

The bug is now in line 399

McLP2 avatar Dec 11 '22 23:12 McLP2

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Dec 17 '22 00:12 github-actions[bot]