stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

Set n_samples to 1 by default

Open darthdeus opened this issue 3 years ago • 12 comments

The defaults of n_samples=3 and n_iter=2 cause a out of memory error on a RTX 2080/3060 that have 11/12GB VRAM respectively, which is completely unnecessary because the model fits fine with batch size of 1.

This seems to be a constant question everywhere and people are just running the model with defaults that have unnecessarily high memory consumption. The following command OOMs on a 12GB RTX 3060

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

but this works fine

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples=1

darthdeus avatar Aug 23 '22 15:08 darthdeus

please just use this https://huggingface.co/spaces/stabilityai/stable-diffusion. not a lot of people have a fucking 11/12GB VRAM gpu.

breadbrowser avatar Aug 23 '22 19:08 breadbrowser

There's more options than that. This repo https://github.com/basujindal/stable-diffusion has an optimized version that uses less VRAM but takes longer. Apparently runs on 4GB, but I haven't tested it myself.

The diffusers repo (https://github.com/huggingface/diffusers) also has an option for half precision, if you run into issues with that you may want to try this patch that apparently fixes it https://huggingface.co/CompVis/stable-diffusion-v1-4/discussions/10

darthdeus avatar Aug 23 '22 20:08 darthdeus

There's more options than that. This repo https://github.com/basujindal/stable-diffusion has an optimized version that uses less VRAM but takes longer. Apparently runs on 4GB, but I haven't tested it myself.

The diffusers repo (https://github.com/huggingface/diffusers) also has an option for half precision, if you run into issues with that you may want to try this patch that apparently fixes it https://huggingface.co/CompVis/stable-diffusion-v1-4/discussions/10

I only have a 1gig GPU also I don't have $150+ to buy a gpu

breadbrowser avatar Aug 23 '22 21:08 breadbrowser

I get that that's the case for many people, but I don't understand why you're posting this in this issue, where I'm actually trying to make this run on more GPUs than by default? I'm not affiliated with stable diffusion, I'm just a user who found a problem and made a patch to fix this.

Also just because some people don't have a GPU doesn't mean that those that do have one shouldn't be able to run this. There are many options for running SD, among the ones listed you can also use Google Colab or just the official Dream Studio.

darthdeus avatar Aug 24 '22 11:08 darthdeus

@darthdeus Thanks for the n_samples ideas, that got me past my OOM issues on a 3080Ti w/ 12GB! Would be great to get this option at least added to the README if nothing else, as search the CUDA max_split warning mostly ends up on threads with people saying "not enough RAM".

qdot avatar Aug 31 '22 16:08 qdot

Exactly! I have 12GB too and was immediately surprised that "holycrap is this not enough"? First thing people told me was to look at the optimized repo, which is totally unnecessary and 512x512 works fine with smaller batch size.

darthdeus avatar Sep 01 '22 14:09 darthdeus

Just wanted to add a follow-up +1 here—on my M1 Mac Studio it can take 2 iterations but not any more, and sometimes not even 2. On my 3080 Ti it can't do 2 and always gives that warning (and it seems like a very common issue as more people test this out on GPUs that only ('only', heh...) have 12 GB of VRAM).

geerlingguy avatar Sep 08 '22 19:09 geerlingguy

@geerlingguy try the https://github.com/lstein/stable-diffusion fork. There's a lot of Mac users working on improving it.

magnusviri avatar Sep 08 '22 19:09 magnusviri

@geerlingguy try the https://github.com/lstein/stable-diffusion fork. There's a lot of Mac users working on improving it.

Seconding this, I moved over to it and it's great.

qdot avatar Sep 08 '22 21:09 qdot

can reduce the size of the output image to 256x256 using (--H --W arguments), the default is 512x512

edmundhong avatar Oct 22 '22 16:10 edmundhong

Unfortunately 256x256 yields much worse quality of images than 512x512, to the point where the results are basically unusable. There are many optimizations (such as in the lstein and automatic1111 repos) that allow running 512x512 on much lower VRAM than this repo.

darthdeus avatar Oct 23 '22 00:10 darthdeus

I have RTX 3060 12gb and same problem it uses 12gb+ even with 760x760 pics on older clients it was fine somehow.

BateauSD avatar Jan 21 '23 23:01 BateauSD