stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: Dreambooth on 8GB VRam GPU (holy grail)
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
Dreambooth training on a 8 GB VRam GPU (holy grail) By using DeepSpeed it's possible to offload some tensors from VRAM to either CPU or NVME allowing to train with less VRAM.
DeepSpeed needs to be enabled with accelerate config. During configuration answer yes to "Do you want to use DeepSpeed?". With DeepSpeed stage 2, fp16 mixed precision and offloading both parameters and optimizer state to cpu it's possible to train on under 8 GB VRAM with a drawback of requiring significantly more RAM (about 25 GB). See documentation for more DeepSpeed configuration options.
Changing the default Adam optimizer to DeepSpeed's special version of Adam deepspeed.ops.adam.DeepSpeedCPUAdam gives a substantial speedup but enabling it requires CUDA toolchain with the same version as pytorch. 8-bit optimizer does not seem to be compatible with DeepSpeed at the moment.
More Info: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
Proposed workflow
Additional information
No response
Some work is being done here for it if you're interested: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002
could the 24GB RAM be partially virtualized using an SSD ?
could the 24GB RAM be partially virtualized using an SSD ?
bigger page/swap file. but... that'll kill it so fast its not even funny.
could the 24GB RAM be partially virtualized using an SSD ?
bigger page/swap file. but... that'll kill it so fast its not even funny.
Kill SSD ?
could the 24GB RAM be partially virtualized using an SSD ?
bigger page/swap file. but... that'll kill it so fast its not even funny.
Kill SSD ?
Yes
Potentially Related:
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/914
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1429
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1734
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002#issuecomment-1296308600
-
Closing, opening new PR to squash commits and make it clean.
-
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2002#issuecomment-1296308600
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3995
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3995#issuecomment-1296308730
-
Please give this a look and merge.
-
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3995#issuecomment-1296308730
Tested this out myself, with DeepSpeed installed and everything else to crunch the memory down, it does NOT work on 8 GB of VRAM. Others are reporting that as well on https://github.com/ShivamShrirao/diffusers
Can't tell if the OOM is bogus or not, as it is only 20 MiB from allocating, but I assume it is trying to allocate more after that
Edit:
After some tinkering it will run, but dies at random with a non-explanatory exception
And after more tinkering, it turns out it can't reliably run on 8GB VRAM, it needs a very particular setup and out of box configuration.
so are the docs wrong then? the docs say it does work with 8gm of ram, Im not able to get it to train on a 12 GB card.
https://github.com/d8ahazard/sd_dreambooth_extension models and training are supported via the extension, so i'm closing multiple DB issues, as no related changes could happen in this repo. Looks like minimal requirement is 10GB after all. Plus, it supports LORA now, which supposedly should work on 8GB.