stable-diffusion
stable-diffusion copied to clipboard
Allow for using CPU if no CUDA device is detected
Code from https://github.com/CompVis/latent-diffusion/pull/123/ applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.
Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.
This should allow for more people to experiment even without owning an nvidia GPU
I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. https://github.com/CompVis/stable-diffusion/pull/47
I have a pull request that looks like it changes the exact same lines as you, except my changes are for Apple Silicon GPU support. The code in my PR tries cuda, then mps (for Apple), then cpu. #47
Thanks for the heads up! I would recommend your PR over mine, since it has more functionality. Stable Diffusion for all!
This pull request will fix https://github.com/CompVis/stable-diffusion/issues/62
Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works.
Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error.
This should allow for more people to experiment even without owning an nvidia GPU
Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?
just use this https://huggingface.co/spaces/stabilityai/stable-diffusion
just use this https://huggingface.co/spaces/stabilityai/stable-diffusion
Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.
just use this https://huggingface.co/spaces/stabilityai/stable-diffusion
Hi, yeah that's a really great option albeit a limited one. Can't control the size of the images and also can't use any adult terms because it is censored.
https://huggingface.co/spaces/Shuang59/Composable-Diffusion
Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU
Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?
If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.
Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU
Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?
If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.
YOU DA MAN/WOMAN!!! Absolutely incredible work that you all have done here. Mind boggling for sure.
Quick question, I see that you guys/gals have coded a moderator to catch explicit outputs? [Safety Checker Module]
Is there any way to disable that?
My hat's off to you guys/gals.
I haven't got it running yet but fingers crossed! ;-)
Verified this patch works. Painfully slow, but it works!
Code from CompVis/latent-diffusion#123 applied to Stable Diffusion and tested on CPU. It is slow, as expected, but works. Allows for running on the CPU if no CUDA device is detected instead of just giving a runtime error. This should allow for more people to experiment even without owning an nvidia GPU
Are there any instructions on exactly how to accomplish this? Like a tutorial or something to follow?
If you download my fork, you can just run the commands as is in the Readme, and it should detect if you have a GPU or not. If not, then it will switch to CPU and run the inference.
Hi, I am having one helluva time and am hoping you might be able to take a look to see what I am doing wrong. Here is what I have done and am doing:
- I open Ubuntu.
- I grab the forked repository
git clone https://github.com/MojoJojo43/stable-diffusion-cpu.git - I change directory:
cd stable-diffusion/ - I create the environment:
conda env create -f environment.yaml - I activate ldm:
conda activate ldm - I pull the ckpt model:
curl https://www.googleapis.com/storage/v1/b/aai-blog-files/o/sd-v1-4.ckpt?alt=media > sd-v1-4.ckpt - I run a generic prompt:
python scripts/txt2img.py --prompt "a photorealistic image of a lizard riding a snowboard through space" --plms --ckpt sd-v1-4.ckpt --skip_grid --n_samples 1
And after that it seems to get going but ultimately stops with an error that reads:
RuntimeError: No CUDA GPUs are available
I have been troubleshooting since 8am and it's now almost 12:00 am my time and I am still no closer to figuring this out lol.
Thanks in advance!!!
Can you post the full output of the error you get after you run the inference script?
There's a missed instance of model.cuda()
That needs to also be updated to: if torch.cuda.is_available(): model.cuda()
in img2img.
Can you post the full output of the error you get after you run the inference script?
Hi there..... So, I have never used Ubuntu.......don't even really know anything about the windows cmd prompt other than it can do some nifty and nasty things to your pc :-) HAHA!
I brute forced my way out of all other possibilities and then started thinking about the env folder structures and whether or not I was in the right folders while executing all the commands. So I ended up removing EVERYTHING and starting from scratch.......
AND IT WORKS!!!!!! WHOOOOOOHOOOOOO!
It's about as slow as molasses but hey, just means I need to invest in a good GPU if I want to do this for really realz ;-)
I have one final question. My machines GPU cannot be upgraded and is too old for Stable Diffusion. Would it be possible to use an eGPU? If so, what type should I be looking at to ensure compatibility? I'm not looking to spend over $500 on it either....the more affordable the better ;-)
Thanks for all the hard work and for being there to help out. Much appreciated!
Hi, is it possible to run this on an Oracle Cloud Ampere A1 with ubuntu as the os? And if so does having 4 cores speed it up in any way?
Hello, when running the img2img script I get this error, https://blazebin.io/ishkwmjgmngo. but, my txt2img script works great! So thank you very much for this fork and the time you spent on it!
I just merged an updated img2img.py, so it should work now. I forgot to modify that script's code originally. Thanks to everyone who pointed it out, and SpandexWizard for applying the fix. Someone please test and let me know. Updated file is https://github.com/ModeratePrawn/stable-diffusion-cpu/blob/main/scripts/img2img.py here, and I think this pull request should have updated with the new file as well.
My graphics card has CUDA, but doesn't have enough memory to run. I have plenty of RAM though. Would there be a way to choose to run on the CPU then?
@Zylann Set env variable CUDA_VISIBLE_DEVICES="". To do that:
- run
export CUDA_VISIBLE_DEVICES=""and then run commands you need (will be lost once terminal is closed) - or just set
CUDA_VISIBLE_DEVICES=""before each command you run, e.g.CUDA_VISIBLE_DEVICES="" python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
@ModeratePrawn thanks for this. Works great on my system (AMD GPU, not Nvidia GPU). I just enter prompts like vanilla stable-diffusion and it defaults to CPU.
@agajdosi Oh ok. I just thought that instead of hardcoding a particular device or behavior in multiple places of the code, that the choice could be done up-front in just one place, in turn allowing to make a choice much more easily. Thanks for the tip!
This PR also helped me tweaking the code, it works well!
I'm wondering if switching to the diffusers lib would be better than using the original repo though? (It looks easier to choose a device, and allows to choose fp16)
It was painful, but I've verified that this PR at commit d68cd0dcb17d9f962868ce665e9f68793493e910 merged into CompVis/main at 69ae4b35e0a0f6ee1af8bb9a5d0016ccb27e36dc works.
My laptop:
macOS Catalina 10.15.7
MacBook Pro (Retina, 15-inch, Early 2013) <-- yes, almost 10 year old computer
Processor 2.7 GHz Quad-Core Intel Core i7
Memory 16 GB 1600 MHz DDR3
Graphics Intel HD Graphics 4000 1536 MB <-- no GPU, old graphics card
And running through the tutorial:
$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
...
/opt/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling: 0%| | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (3, 4, 64, 64) | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:54:07<00:00, 136.95s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:57:25<00:00, 7045.09s/it]
Sampling: 50%|██████████████████████████ | 1/2 [1:57:25<1:57:25, 7045.09s/itData shape for PLMS sampling is (3, 4, 64, 64) | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
^[[B
PLMS Sampler: 0%| | 0/50 [00:00<?, ?it/s]
PLMS Sampler: 100%|█████████████████████████████████████████████████| 50/50 [1:52:48<00:00, 135.38s/it]
data: 100%|██████████████████████████████████████████████████████████| 1/1 [1:55:53<00:00, 6953.26s/it]
Sampling: 100%|██████████████████████████████████████████████████████| 2/2 [3:53:18<00:00, 6999.18s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples
Enjoy.
I ran this late last night, went to bed, and saw some astronauts on horses. QA'ed, it works!
Edit: added OS version
@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?
@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?
@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a NVIDIA GeForce GT 650m 1GB in addition to Intel integrated graphics, curious to see if torch mps device would use the NVIDIA card speed up from a 6hr cpu run 😆 (it should...). I did a small test with the Metal framework and the default MTLDevice says NVIDIA, so in theory, it should.
The OS is macOS Catalina 10.15.7, which I also learned today that CUDA is off the table unless I downgrade to macOS High Sierra 10.13.
🤞 for mps.
@ModeratePrawn is there any way to utilize more CPU cores? On Windows 11 21H2 I'm getting an average of 10s/it, which isn't too bad, but stable-diffusion only uses about ~50% of my CPU (12c/24t):
warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
Sampling: 0%| | 0/1 [00:00<?, ?it/s]Data shape for PLMS sampling is (1, 4, 64, 64) | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████| 50/50 [08:08<00:00, 9.76s/it]
data: 100%|█████████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Sampling: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [08:13<00:00, 493.67s/it]
Your samples are ready and waiting for you here:
outputs/txt2img-samples
Enjoy.

If I can increase CPU utilization by about 50% (to about 75% total), I could get a roughly proportional decrease in run time.
@bfung will you try the https://github.com/lstein/stable-diffusion version and see if it works? We added a lot of macOS arm support and I'm curious if it works on your Intel also. What is your OS?
@magnusviri I'll take a look and give it a shot in next few days. I was inspecting my laptop's hardware and noticed that it has a `NVIDIA GeForce GT 650m
From what I learned from my PC, your GT 650M is only cuda compatible up to 3.0 and pytorch dropped cuda 3.0 years ago in pytorch version 0.31. I'm curious if you will actually get your GPU to run Diffusion.
Following up on my comment and to recap the information:
This PR works on CPU on a 10 year old mac
I have a ~10 year old mac laptop with the following specs:
macOS Catalina
Version 10.15.7
MacBook Pro (Retina, 15-inch, Early 2013)
Processor 2.7 GHz Quad-Core Intel Core i7
Memory 16 GB 1600 MHz DDR3
Graphics NVIDIA GeForce GT 650M 1 GB
Intel HD Graphics 4000 1536 MB
GPU on 10 year old mac?
will you try the https://github.com/lstein/stable-diffusion version and see if it works?
@magnusviri @MojoJojo43
With my old laptop hardware and OS version, the https://github.com/lstein/stable-diffusion version didn't work for me when using python3 scripts/dream.py --device mps. A couple error message appeared, but looks like the errors are thrown from the underlying libraries. It's probably not worth the effort in trying to support my very old laptop. I found several things:
pytorchbackend support With a couple different scripts, such as https://pytorch.org/docs/stable/notes/mps.html, thetorch.backend.mpsis only available for macOS 12.3+, while I'm stuck on 10.15.- I tried using https://github.com/geohot/tinygrad to swap out the
pytorchbackend to use OpenCL, but again, my hardware/OS version is so old that some of the OpenCL stuff doesn't really work and would need a lot of coding to get it to work.
On the bright side, it looks like other people have gotten the mps backend to work with lstein/stable-diffusion. For example:
- https://news.ycombinator.com/item?id=32804695 (https://github.com/divamgupta/diffusionbee-stable-diffusion-ui 1-click install for mac M1)
Hope this info helps - cheers all.
Can this be merged? support for MPS would be great too. thanks.
Can this be merged? support for MPS would be great too. thanks.
MPS support is in A1111, InvokeAI, ComfyUI, and others I'm sure. This CompVis repo is abandoned.
Oh my bad, I just Googled "stable diffusion github", this repo was the first result, so I tried to use it. I'll check out those - Thanks!