stable-diffusion-webui [Feature Request]: Dreambooth on 8GB VRam GPU (holy grail)

trafficstars

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Dreambooth training on a 8 GB VRam GPU (holy grail) By using DeepSpeed it's possible to offload some tensors from VRAM to either CPU or NVME allowing to train with less VRAM.

DeepSpeed needs to be enabled with accelerate config. During configuration answer yes to "Do you want to use DeepSpeed?". With DeepSpeed stage 2, fp16 mixed precision and offloading both parameters and optimizer state to cpu it's possible to train on under 8 GB VRAM with a drawback of requiring significantly more RAM (about 25 GB). See documentation for more DeepSpeed configuration options.

Changing the default Adam optimizer to DeepSpeed's special version of Adam deepspeed.ops.adam.DeepSpeedCPUAdam gives a substantial speedup but enabling it requires CUDA toolchain with the same version as pytorch. 8-bit optimizer does not seem to be compatible with DeepSpeed at the moment.

More Info: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

Proposed workflow

Additional information

No response

Oct 24 '22 18:10 Centurion-Rome

Some work is being done here for it if you're interested: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002

Oct 24 '22 20:10 Evil-Dragon

could the 24GB RAM be partially virtualized using an SSD ?

Oct 25 '22 09:10 Ehplodor

could the 24GB RAM be partially virtualized using an SSD ?

bigger page/swap file. but... that'll kill it so fast its not even funny.

Oct 25 '22 21:10 USBhost

could the 24GB RAM be partially virtualized using an SSD ?

bigger page/swap file. but... that'll kill it so fast its not even funny.

Kill SSD ?

Oct 26 '22 07:10 Ehplodor

could the 24GB RAM be partially virtualized using an SSD ?

bigger page/swap file. but... that'll kill it so fast its not even funny.

Kill SSD ?

Yes

Oct 26 '22 11:10 USBhost

Potentially Related:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/914
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1429
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1734
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002#issuecomment-1296308600
  - Closing, opening new PR to squash commits and make it clean.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3995
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3995#issuecomment-1296308730
  - Please give this a look and merge.

Nov 01 '22 04:11 0xdevalias

Tested this out myself, with DeepSpeed installed and everything else to crunch the memory down, it does NOT work on 8 GB of VRAM. Others are reporting that as well on https://github.com/ShivamShrirao/diffusers

Can't tell if the OOM is bogus or not, as it is only 20 MiB from allocating, but I assume it is trying to allocate more after that

Edit:

After some tinkering it will run, but dies at random with a non-explanatory exception

And after more tinkering, it turns out it can't reliably run on 8GB VRAM, it needs a very particular setup and out of box configuration.

Nov 03 '22 05:11 78Alpha

so are the docs wrong then? the docs say it does work with 8gm of ram, Im not able to get it to train on a 12 GB card.

Nov 13 '22 04:11 jtoy

https://github.com/d8ahazard/sd_dreambooth_extension models and training are supported via the extension, so i'm closing multiple DB issues, as no related changes could happen in this repo. Looks like minimal requirement is 10GB after all. Plus, it supports LORA now, which supposedly should work on 8GB.

Jan 12 '23 22:01 mezotaken

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Feature Request]: Dreambooth on 8GB VRam GPU (holy grail)

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard