stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: Support for Apple's Core ML Stable Diffusion
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
https://github.com/apple/ml-stable-diffusion
Apple very recently added support to convert Stable Diffusion models to the CoreML format to allow for faster generation time.
- It would be nice to support this conversion pipeline within the web UI, perhaps as an option in an extras tab or checkpoint merger (its not really a merge per say, but it could apply?)
- Allow the webUI to run the coreML models instead of the regular SD pytorch models.
Proposed workflow
- Go to the extras tab or checkpoint merger tab
- Select a script or similar to convert a .ckpt file in your models directory to the coreML format
- Allow use of the coreML model within the webUI for Apple users.
Additional information
No response
https://github.com/apple/ml-stable-diffusion/issues/9
This please!
https://github.com/apple/ml-stable-diffusion
I've tried running and inspecting the sample from the repository above (still investigating), It looks like the coreml format does not reduce image generation time. Rather than coreml, the pytorch implementation is slightly faster (at least on my MacStudio, M1Ultra, 48GPU). It's good to want Automatic1111 to support coreml format, but before that, each person who has M1Mac should do some benchmarking and carefully consider whether it's really an urgent matter to request.
Translated from Japanese to English by Google.
Yes I have also finally tried it on Mac M1 and it is indeed slower than current implementations.
u tried the python or swift one ?
Yes I have also finally tried it on Mac M1 and it is indeed slower than current implementations.
Please publish all the relevant details, e.g., macOS version (latest 13.1 beta is needed), which Mac, which compute units, Swift or Python, and whether you have included the model loading time.
Their own benchmarks say that an M2 generates an image in 23 seconds, which is certainly much faster than PyTorch. I myself don't have macOS 13.1 and Xcode installed to test.
I can test on Macbook Air M2 with 24GB of RAM. But a little guidance on how, no crazy detail needed, would be nice.
MacStudio(M1ultra, 128GB RAM, 48cores GPU), macOS Ventura(13.1beta), xcode14.1 imagesize:512x512 , steps:20(21 in coreml_python), model:CompVis/stable-diffusion-v1-4 (not include model load time)
WebUI:2.61it/s (use MPS) coreml(Python):2.64it/s (use cpu,gpu,ane) coreml(Swift):2.23it/s (use cpu,gpu,ane)
There doesn't seem to be a dramatic difference in speed.
Automatic1111-SD-WebUI(sampling method:Euler a) use MPS
Total progress: 100%|███████████████████████████| 20/20 [00:07<00:00, 2.61it/s]
use cpuonly(--use-cpu all)
Total progress: 100%|███████████████████████████| 20/20 [01:10<00:00, 3.51s/it]
=0.285it/s
coreml(Python, Scheduler:default, maybe DDIM) python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit ALL --seed 93 --num-inference-steps 20
100%|███████████████████████████████████████████| 21/21 [00:07<00:00, 2.64it/s]
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_GPU --seed 93 --num-inference-steps 20
100%|███████████████████████████████████████████| 21/21 [00:12<00:00, 1.73it/s]
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_NE --seed 93 --num-inference-steps 20
100%|███████████████████████████████████████████| 21/21 [00:17<00:00, 1.22it/s]
coreml(Swift, Scheduler or Samplemethod: Unknown because I don't know much about swift) swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units all
Step 20 of 20 [mean: 2.23, median: 2.50, last 2.46] step/sec
swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuAndNeuralEngine
Step 20 of 20 [mean: 1.16, median: 1.17, last 1.16] step/sec
swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuAndGPU
Step 20 of 20 [mean: 1.86, median: 2.96, last 2.95] step/sec
swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuOnly
Step 20 of 20 [mean: 0.12, median: 0.12, last 0.12] step/sec
@autumnmotor ~~Aren't your results much better than Torch MPS? 1.22it/s vs 2.61it/s.~~
use MPS
Total progress: 100%|███████████████████████████| 20/20 [00:07<00:00, 2.61it/s]
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_NE --seed 93 --num-inference-steps 20
100%|███████████████████████████████████████████| 21/21 [00:17<00:00, 1.22it/s]
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
... and rumors tell us, we should use beta4 ventura 13.1
what Xact MacOS build u use, sir ?
@NightMachinery Hmmm... I think that the higher the unit of "it/s" (iterate per second), the better the performance... Also check the total processing time on the left.(not include model load) webui(use mps):7sec coreml_python:17sec
@sascha1337
... and rumors tell us, we should use beta4 ventura 13.1
my macOS is 13.1 Beta(22C5033e). it's mean beta "1" Thanks for the very useful information. Luckily I'm in the Apple Developer Program, so I'll try it later.
macOS13.1beta1 -> beta4
WebUI(use MPS):2.61it/s -> 2.66it/s coreml(Python,use cpu,gpu,ane):2.64it/s -> 2.65it/s coreml(Swift,use cpu,gpu,ane):2.23it/s -> 2.23it/s
I still need to look into it more carefully, but I think the current conclusion is within the margin of error.
@autumnmotor ser what sampler, ddim ?
@autumnmotor Could you also report coreml(Python,use cpu,gpu)
please?
There is a basic implementation now
https://github.com/godly-devotion/mochi-diffusion
Would be great if you guys somehow teamed up!
How would we go about testing the CoreML versions already converted? I assume I can't just drop all of the files into the models directory?
use MPS
how do we use MPS with the webUI? I thought it was only CPU
Please do it
I'm a Mac user and I tried the Draw Things software that supports CoreML. On mac mini M1, same as anything v3, step30, it takes about 2min40s to generate a card using webui and 45s to generate a card using DT, so I think it is still necessary to support CoreML. I'm on another mac, it's M1 Pro, 20s for DT, 35s for webui. (In addition to teasing, some interface design reference to the DT)
please.
bump
I just compiled the HF Diffusers app on my M2Max and can whip out a 45-step sd2.1 image in about 18.5s vs 43s with A1111 and the pruned model
there are apps that use apple's core ML stable diffusion. The best one I could find is here: https://github.com/godly-devotion/MochiDiffusion
however, if you've ever tried using apples's core ml implementation, you might have noticed that it takes a LONG time to initialize the model everytime it first runs. Using the CLI from apple's examples, it takes like a minute on my m1 to before the model even starts running. I think Mochi caches the core ml making it more useful. On macbook air m1, I'm only seeing a 20% increase in diffusion speed, at most, but the startup time for loading any model makes it not worth it. A M1 pro / max, or M2 pro / max, might see much more significant gains than the m1 base model
Dude this depends on the RAM 8gb u got bad times 128gb the way to go, keep UNET chunks in cache
Please do it!
DrawThings has an HTTP API. Maybe something could be done to send requests etc for things that it can handle over to that while keeping A1111 the front-end?
there are apps that use apple's core ML stable diffusion. The best one I could find is here: https://github.com/godly-devotion/MochiDiffusion
however, if you've ever tried using apples's core ml implementation, you might have noticed that it takes a LONG time to initialize the model everytime it first runs. Using the CLI from apple's examples, it takes like a minute on my m1 to before the model even starts running. I think Mochi caches the core ml making it more useful. On macbook air m1, I'm only seeing a 20% increase in diffusion speed, at most, but the startup time for loading any model makes it not worth it. A M1 pro / max, or M2 pro / max, might see much more significant gains than the m1 base model
Check out Draw Things... it's not open source but it is free and it beats everything else in performance, I think.
Dude this depends on the RAM 8gb u got bad times 128gb the way to go, keep UNET chunks in cache
128GB Mac memory lol. Apples golden money earner. Apple silicon only currently goes up to 96 I believe.