stable-diffusion
stable-diffusion copied to clipboard
Set n_samples to 1 by default
The defaults of n_samples=3 and n_iter=2 cause a out of memory error on a RTX 2080/3060 that have 11/12GB VRAM respectively, which is completely unnecessary because the model fits fine with batch size of 1.
This seems to be a constant question everywhere and people are just running the model with defaults that have unnecessarily high memory consumption. The following command OOMs on a 12GB RTX 3060
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
but this works fine
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples=1
please just use this https://huggingface.co/spaces/stabilityai/stable-diffusion. not a lot of people have a fucking 11/12GB VRAM gpu.
There's more options than that. This repo https://github.com/basujindal/stable-diffusion has an optimized version that uses less VRAM but takes longer. Apparently runs on 4GB, but I haven't tested it myself.
The diffusers repo (https://github.com/huggingface/diffusers) also has an option for half precision, if you run into issues with that you may want to try this patch that apparently fixes it https://huggingface.co/CompVis/stable-diffusion-v1-4/discussions/10
There's more options than that. This repo https://github.com/basujindal/stable-diffusion has an optimized version that uses less VRAM but takes longer. Apparently runs on 4GB, but I haven't tested it myself.
The diffusers repo (https://github.com/huggingface/diffusers) also has an option for half precision, if you run into issues with that you may want to try this patch that apparently fixes it https://huggingface.co/CompVis/stable-diffusion-v1-4/discussions/10
I only have a 1gig GPU also I don't have $150+ to buy a gpu
I get that that's the case for many people, but I don't understand why you're posting this in this issue, where I'm actually trying to make this run on more GPUs than by default? I'm not affiliated with stable diffusion, I'm just a user who found a problem and made a patch to fix this.
Also just because some people don't have a GPU doesn't mean that those that do have one shouldn't be able to run this. There are many options for running SD, among the ones listed you can also use Google Colab or just the official Dream Studio.
@darthdeus Thanks for the n_samples ideas, that got me past my OOM issues on a 3080Ti w/ 12GB! Would be great to get this option at least added to the README if nothing else, as search the CUDA max_split warning mostly ends up on threads with people saying "not enough RAM".
Exactly! I have 12GB too and was immediately surprised that "holycrap is this not enough"? First thing people told me was to look at the optimized repo, which is totally unnecessary and 512x512 works fine with smaller batch size.
Just wanted to add a follow-up +1 here—on my M1 Mac Studio it can take 2 iterations but not any more, and sometimes not even 2. On my 3080 Ti it can't do 2 and always gives that warning (and it seems like a very common issue as more people test this out on GPUs that only ('only', heh...) have 12 GB of VRAM).
@geerlingguy try the https://github.com/lstein/stable-diffusion fork. There's a lot of Mac users working on improving it.
@geerlingguy try the https://github.com/lstein/stable-diffusion fork. There's a lot of Mac users working on improving it.
Seconding this, I moved over to it and it's great.
can reduce the size of the output image to 256x256 using (--H --W arguments), the default is 512x512
Unfortunately 256x256 yields much worse quality of images than 512x512, to the point where the results are basically unusable. There are many optimizations (such as in the lstein and automatic1111 repos) that allow running 512x512 on much lower VRAM than this repo.
I have RTX 3060 12gb and same problem it uses 12gb+ even with 760x760 pics on older clients it was fine somehow.