stable-diffusion Allow for using CPU if no CUDA device is detected

Code from https://github.com/CompVis/latent-diffusion/pull/123/ applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.

Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.

This should allow for more people to experiment even without owning an nvidia GPU

Aug 22 '22 20:08 ModeratePrawn

I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. https://github.com/CompVis/stable-diffusion/pull/47

Aug 22 '22 22:08 magnusviri

I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. #47

Thanks for the heads up! I would recommend your PR over mine, since it has more functionality. Stable Diffusion for all!

Aug 23 '22 11:08 ModeratePrawn

This pull request will fix https://github.com/CompVis/stable-diffusion/issues/62

Aug 23 '22 16:08 fragmentshader2022

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.

Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.

This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

Aug 26 '22 17:08 MojoJojo43

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

Aug 27 '22 01:08 breadbrowser

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.

Aug 27 '22 03:08 MojoJojo43

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.

https://huggingface.co/spaces/Shuang59/Composable-Diffusion

Aug 27 '22 12:08 breadbrowser

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

Aug 27 '22 15:08 ModeratePrawn

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

YOU DA MAN/WOMAN!!! Absolutely incredible work that you all have done here. Mind boggling for sure.

Quick question, I see that you guys/gals have coded a moderator to catch explicit outputs? [Safety Checker Module] Is there any way to disable that?

My hat's off to you guys/gals.

I haven't got it running yet but fingers crossed! ;-)

Aug 27 '22 16:08 MojoJojo43

Verified this patch works. Painfully slow, but it works!

Aug 27 '22 23:08 DasWookie

Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU

Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?

If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.

Hi, I am having one helluva time and am hoping you might be able to take a look to see what I am doing wrong. Here is what I have done and am doing:

I open Ubuntu.
I grab the forked repository git clone https://github.com/MojoJojo43/stable-diffusion-cpu.git
I change directory: cd stable-diffusion/
I create the environment: conda env create -f environment.yaml
I activate ldm: conda activate ldm
I pull the ckpt model: curl https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media > sd-v1-4.ckpt
I run a generic prompt: python scripts/txt2img.py --prompt "a photorealistic image of a lizard riding a snowboard through space" --plms --ckpt sd-v1-4.ckpt --skip_grid --n_samples 1

And after that it seems to get going but ultimately stops with an error that reads:

RuntimeError: No CUDA GPUs are available

I have been troubleshooting since 8am and it's now almost 12:00 am my time and I am still no closer to figuring this out lol.

Thanks in advance!!!

Aug 28 '22 06:08 MojoJojo43

Can you post the full output of the error you get after you run the inference script?

Aug 28 '22 12:08 ModeratePrawn

There's a missed instance of model.cuda()

That needs to also be updated to: if torch.cuda.is_available(): model.cuda()

in img2img.

Aug 28 '22 14:08 DasWookie

Can you post the full output of the error you get after you run the inference script?

Hi there..... So, I have never used Ubuntu.......don't even really know anything about the windows cmd prompt other than it can do some nifty and nasty things to your pc :-) HAHA!

I brute forced my way out of all other possibilities and then started thinking about the env folder structures and whether or not I was in the right folders while executing all the commands. So I ended up removing EVERYTHING and starting from scratch.......

AND IT WORKS!!!!!! WHOOOOOOHOOOOOO!

It's about as slow as molasses but hey, just means I need to invest in a good GPU if I want to do this for really realz ;-)

I have one final question. My machines GPU cannot be upgraded and is too old for Stable Diffusion. Would it be possible to use an eGPU? If so, what type should I be looking at to ensure compatibility? I'm not looking to spend over $500 on it either....the more affordable the better ;-)

Thanks for all the hard work and for being there to help out. Much appreciated!

Aug 28 '22 18:08 MojoJojo43

Hi, is it possible to run this on an Oracle Cloud Ampere A1 with ubuntu as the os? And if so does having 4 cores speed it up in any way?

Aug 31 '22 06:08 Apoo711

Hello, when running the img2img script I get this error, https://blazebin.io/ishkwmjgmngo. but, my txt2img script works great! So thank you very much for this fork and the time you spent on it!

Sep 01 '22 06:09 Apoo711

I just merged an updated img2img.py, so it should work now. I forgot to modify that script's code originally. Thanks to everyone who pointed it out, and SpandexWizard for applying the fix. Someone please test and let me know. Updated file is https://github.com/ModeratePrawn/stable-diffusion-cpu/blob/main/scripts/img2img.py here, and I think this pull request should have updated with the new file as well.

Sep 01 '22 12:09 ModeratePrawn

My graphics card has CUDA, but doesn't have enough memory to run. I have plenty of RAM though. Would there be a way to choose to run on the CPU then?

Sep 05 '22 23:09 Zylann

@Zylann Set env variable CUDA_VISIBLE_DEVICES="". To do that:

run export CUDA_VISIBLE_DEVICES="" and then run commands you need (will be lost once terminal is closed)
or just set CUDA_VISIBLE_DEVICES="" before each command you run, e.g. CUDA_VISIBLE_DEVICES="" python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

Sep 06 '22 10:09 agajdosi

@ModeratePrawn thanks for this. Works great on my system (AMD GPU, not Nvidia GPU). I just enter prompts like vanilla stable-diffusion and it defaults to CPU.

Sep 06 '22 18:09 chipmunkofdoom2

@agajdosi Oh ok. I just thought that instead of hardcoding a particular device or behavior in multiple places of the code, that the choice could be done up-front in just one place, in turn allowing to make a choice much more easily. Thanks for the tip!

This PR also helped me tweaking the code, it works well! I'm wondering if switching to the diffusers lib would be better than using the original repo though? (It looks easier to choose a device, and allows to choose fp16)

Sep 06 '22 18:09 Zylann

It was painful, but I've verified that this PR at commit d68cd0dcb17d9f962868ce665e9f68793493e910 merged into CompVis/main at 69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc works.

My laptop:

macOS Catalina 10.15.7
MacBook Pro (Retina, 15-inch, Early 2013)  <-- yes, almost 10 year old computer
Processor 2.7 GHz Quad-Core Intel Core i7
Memory 16 GB 1600 MHz DDR3
Graphics Intel HD Graphics 4000 1536 MB    <-- no GPU, old graphics card

And running through the tutorial:

$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
...
/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling:   0%|                                                                  | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (3, 4, 64, 64)                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:54:07<00:00, 136.95s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:57:25<00:00, 7045.09s/it]
Sampling:  50%|██████████████████████████                          | 1/2 [1:57:25<1:57:25, 7045.09s/itData shape for PLMS sampling is (3, 4, 64, 64)                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
                                                                                                      ^[[B
PLMS Sampler:   0%|                                                             | 0/50 [00:00<?, ?it/s]




PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:52:48<00:00, 135.38s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:55:53<00:00, 6953.26s/it]
Sampling: 100%|██████████████████████████████████████████████████████| 2/2 [3:53:18<00:00, 6999.18s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples

Enjoy.

I ran this late last night, went to bed, and saw some astronauts on horses. QA'ed, it works!

Edit: added OS version

Sep 07 '22 22:09 bfung

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

Sep 07 '22 23:09 magnusviri

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a NVIDIA GeForce GT 650m 1GB in addition to Intel integrated graphics, curious to see if torch mps device would use the NVIDIA card speed up from a 6hr cpu run 😆 (it should...). I did a small test with the Metal framework and the default MTLDevice says NVIDIA, so in theory, it should.

The OS is macOS Catalina 10.15.7, which I also learned today that CUDA is off the table unless I downgrade to macOS High Sierra 10.13.

🤞 for mps.

Sep 08 '22 06:09 bfung

@ModeratePrawn is there any way to utilize more CPU cores? On Windows 11 21H2 I'm getting an average of 10s/it, which isn't too bad, but stable-diffusion only uses about ~50% of my CPU (12c/24t):

warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling:   0%|                                                                                  | 0/1 [00:00<?, ?it/s]Data shape for PLMS sampling is (1, 4, 64, 64)                                                    | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████| 50/50 [08:08<00:00,  9.76s/it]
data: 100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Sampling: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples

Enjoy.

If I can increase CPU utilization by about 50% (to about 75% total), I could get a roughly proportional decrease in run time.

Sep 08 '22 12:09 chipmunkofdoom2

@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?

@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a `NVIDIA GeForce GT 650m

From what I learned from my PC, your GT 650M is only cuda compatible up to 3.0 and pytorch dropped cuda 3.0 years ago in pytorch version 0.31. I'm curious if you will actually get your GPU to run Diffusion.

Sep 08 '22 16:09 MojoJojo43

Following up on my comment and to recap the information:

This PR works on CPU on a 10 year old mac

I have a ~10 year old mac laptop with the following specs:

macOS Catalina 
Version 10.15.7

MacBook Pro (Retina, 15-inch, Early 2013)
Processor 2.7 GHz Quad-Core Intel Core i7
Memory    16 GB 1600 MHz DDR3
Graphics  NVIDIA GeForce GT 650M 1 GB
          Intel HD Graphics 4000 1536 MB

GPU on 10 year old mac?

will you try the https://github.com/lstein/stable-diffusion version and see if it works?

@magnusviri @MojoJojo43

With my old laptop hardware and OS version, the https://github.com/lstein/stable-diffusion version didn't work for me when using python3 scripts/dream.py --device mps. A couple error message appeared, but looks like the errors are thrown from the underlying libraries. It's probably not worth the effort in trying to support my very old laptop. I found several things:

pytorch backend support With a couple different scripts, such as https://pytorch.org/docs/stable/notes/mps.html, the torch.backend.mps is only available for macOS 12.3+, while I'm stuck on 10.15.
I tried using https://github.com/geohot/tinygrad to swap out the pytorch backend to use OpenCL, but again, my hardware/OS version is so old that some of the OpenCL stuff doesn't really work and would need a lot of coding to get it to work.

On the bright side, it looks like other people have gotten the mps backend to work with lstein/stable-diffusion. For example:

https://news.ycombinator.com/item?id=32804695 (https://github.com/divamgupta/diffusionbee-stable-diffusion-ui 1-click install for mac M1)

Hope this info helps - cheers all.

Sep 13 '22 23:09 bfung

Can this be merged? support for MPS would be great too. thanks.

Nov 24 '23 22:11 ryanhugh

Can this be merged? support for MPS would be great too. thanks.

MPS support is in A1111, InvokeAI, ComfyUI, and others I'm sure. This CompVis repo is abandoned.

Nov 25 '23 04:11 magnusviri

Oh my bad, I just Googled "stable diffusion github", this repo was the first result, so I tried to use it. I'll check out those - Thanks!

Nov 26 '23 22:11 ryanhugh

stable-diffusion stable-diffusion copied to clipboard

Allow for using CPU if no CUDA device is detected

This PR works on CPU on a 10 year old mac

GPU on 10 year old mac?

stable-diffusion
stable-diffusion copied to clipboard