stable-diffusion-webui [Feature Request]: Support for Apple's Core ML Stable Diffusion

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

https://github.com/apple/ml-stable-diffusion

Apple very recently added support to convert Stable Diffusion models to the CoreML format to allow for faster generation time.

It would be nice to support this conversion pipeline within the web UI, perhaps as an option in an extras tab or checkpoint merger (its not really a merge per say, but it could apply?)
Allow the webUI to run the coreML models instead of the regular SD pytorch models.

Proposed workflow

Go to the extras tab or checkpoint merger tab
Select a script or similar to convert a .ckpt file in your models directory to the coreML format
Allow use of the coreML model within the webUI for Apple users.

Additional information

No response

Dec 01 '22 20:12 jkcarney

https://github.com/apple/ml-stable-diffusion/issues/9

Dec 02 '22 05:12 NightMachinery

This please!

Dec 02 '22 07:12 ioma8

https://github.com/apple/ml-stable-diffusion

I've tried running and inspecting the sample from the repository above (still investigating), It looks like the coreml format does not reduce image generation time. Rather than coreml, the pytorch implementation is slightly faster (at least on my MacStudio, M1Ultra, 48GPU). It's good to want Automatic1111 to support coreml format, but before that, each person who has M1Mac should do some benchmarking and carefully consider whether it's really an urgent matter to request.

Translated from Japanese to English by Google.

Dec 02 '22 14:12 autumnmotor

Yes I have also finally tried it on Mac M1 and it is indeed slower than current implementations.

Dec 02 '22 16:12 ioma8

u tried the python or swift one ?

Dec 02 '22 17:12 sascha1337

Yes I have also finally tried it on Mac M1 and it is indeed slower than current implementations.

Please publish all the relevant details, e.g., macOS version (latest 13.1 beta is needed), which Mac, which compute units, Swift or Python, and whether you have included the model loading time.

Their own benchmarks say that an M2 generates an image in 23 seconds, which is certainly much faster than PyTorch. I myself don't have macOS 13.1 and Xcode installed to test.

Dec 02 '22 19:12 NightMachinery

I can test on Macbook Air M2 with 24GB of RAM. But a little guidance on how, no crazy detail needed, would be nice.

Dec 02 '22 21:12 ronytomen

MacStudio(M1ultra, 128GB RAM, 48cores GPU), macOS Ventura(13.1beta), xcode14.1 imagesize:512x512 , steps:20(21 in coreml_python), model:CompVis/stable-diffusion-v1-4 (not include model load time)

WebUI:2.61it/s (use MPS) coreml(Python):2.64it/s (use cpu,gpu,ane) coreml(Swift):2.23it/s (use cpu,gpu,ane)

There doesn't seem to be a dramatic difference in speed.

Automatic1111-SD-WebUI(sampling method:Euler a) use MPS

Total progress: 100%|███████████████████████████| 20/20 [00:07<00:00, 2.61it/s]

use cpuonly(--use-cpu all)

Total progress: 100%|███████████████████████████| 20/20 [01:10<00:00, 3.51s/it]

=0.285it/s

coreml(Python, Scheduler:default, maybe DDIM) python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit ALL --seed 93 --num-inference-steps 20

100%|███████████████████████████████████████████| 21/21 [00:07<00:00, 2.64it/s]

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_GPU --seed 93 --num-inference-steps 20

100%|███████████████████████████████████████████| 21/21 [00:12<00:00, 1.73it/s]

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_NE --seed 93 --num-inference-steps 20

100%|███████████████████████████████████████████| 21/21 [00:17<00:00, 1.22it/s]

coreml(Swift, Scheduler or Samplemethod: Unknown because I don't know much about swift) swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units all

Step 20 of 20 [mean: 2.23, median: 2.50, last 2.46] step/sec

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuAndNeuralEngine

Step 20 of 20 [mean: 1.16, median: 1.17, last 1.16] step/sec

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuAndGPU

Step 20 of 20 [mean: 1.86, median: 2.96, last 2.95] step/sec

swift run StableDiffusionSample "a photo of an astronaut riding a horse on mars" --resource-path ./sdmodel/Resources/ --seed 93 --output-path ./out --step-count 20 --compute-units cpuOnly

Step 20 of 20 [mean: 0.12, median: 0.12, last 0.12] step/sec

Dec 02 '22 23:12 autumnmotor

@autumnmotor ~~Aren't your results much better than Torch MPS? 1.22it/s vs 2.61it/s.~~

use MPS

Total progress: 100%|███████████████████████████| 20/20 [00:07<00:00, 2.61it/s]

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./sdmodel -o ./out --compute-unit CPU_AND_NE --seed 93 --num-inference-steps 20

100%|███████████████████████████████████████████| 21/21 [00:17<00:00, 1.22it/s]

Dec 02 '22 23:12 NightMachinery

INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.

... and rumors tell us, we should use beta4 ventura 13.1

what Xact MacOS build u use, sir ?

Dec 03 '22 00:12 sascha1337

@NightMachinery Hmmm... I think that the higher the unit of "it/s" (iterate per second), the better the performance... Also check the total processing time on the left.(not include model load) webui(use mps):7sec coreml_python:17sec

@sascha1337

... and rumors tell us, we should use beta4 ventura 13.1

my macOS is 13.1 Beta（22C5033e). it's mean beta "1" Thanks for the very useful information. Luckily I'm in the Apple Developer Program, so I'll try it later.

Dec 03 '22 00:12 autumnmotor

macOS13.1beta1 -> beta4

WebUI(use MPS):2.61it/s -> 2.66it/s coreml(Python,use cpu,gpu,ane):2.64it/s -> 2.65it/s coreml(Swift,use cpu,gpu,ane):2.23it/s -> 2.23it/s

I still need to look into it more carefully, but I think the current conclusion is within the margin of error.

Dec 03 '22 01:12 autumnmotor

@autumnmotor ser what sampler, ddim ?

Dec 04 '22 00:12 sascha1337

@autumnmotor Could you also report coreml(Python,use cpu,gpu) please?

Dec 17 '22 19:12 atiorh

There is a basic implementation now

https://github.com/godly-devotion/mochi-diffusion

Would be great if you guys somehow teamed up!

Dec 19 '22 16:12 juan9999

How would we go about testing the CoreML versions already converted? I assume I can't just drop all of the files into the models directory?

Feb 03 '23 14:02 rjp23

use MPS

how do we use MPS with the webUI? I thought it was only CPU

Feb 09 '23 04:02 RnbWd

Please do it

Mar 25 '23 17:03 namnhfreelancer

I'm a Mac user and I tried the Draw Things software that supports CoreML. On mac mini M1, same as anything v3, step30, it takes about 2min40s to generate a card using webui and 45s to generate a card using DT, so I think it is still necessary to support CoreML. I'm on another mac, it's M1 Pro, 20s for DT, 35s for webui. (In addition to teasing, some interface design reference to the DT)

Apr 29 '23 11:04 GrinZero

please.

Jul 02 '23 06:07 9Somboon

bump

I just compiled the HF Diffusers app on my M2Max and can whip out a 45-step sd2.1 image in about 18.5s vs 43s with A1111 and the pruned model

Aug 04 '23 18:08 genevera

there are apps that use apple's core ML stable diffusion. The best one I could find is here: https://github.com/godly-devotion/MochiDiffusion

however, if you've ever tried using apples's core ml implementation, you might have noticed that it takes a LONG time to initialize the model everytime it first runs. Using the CLI from apple's examples, it takes like a minute on my m1 to before the model even starts running. I think Mochi caches the core ml making it more useful. On macbook air m1, I'm only seeing a 20% increase in diffusion speed, at most, but the startup time for loading any model makes it not worth it. A M1 pro / max, or M2 pro / max, might see much more significant gains than the m1 base model

Aug 08 '23 02:08 RnbWd

Dude this depends on the RAM 8gb u got bad times 128gb the way to go, keep UNET chunks in cache

Aug 08 '23 03:08 sascha1337

Please do it!

Sep 03 '23 13:09 foolyoghurt

DrawThings has an HTTP API. Maybe something could be done to send requests etc for things that it can handle over to that while keeping A1111 the front-end?

Oct 27 '23 01:10 genevera

there are apps that use apple's core ML stable diffusion. The best one I could find is here: https://github.com/godly-devotion/MochiDiffusion

however, if you've ever tried using apples's core ml implementation, you might have noticed that it takes a LONG time to initialize the model everytime it first runs. Using the CLI from apple's examples, it takes like a minute on my m1 to before the model even starts running. I think Mochi caches the core ml making it more useful. On macbook air m1, I'm only seeing a 20% increase in diffusion speed, at most, but the startup time for loading any model makes it not worth it. A M1 pro / max, or M2 pro / max, might see much more significant gains than the m1 base model

Check out Draw Things... it's not open source but it is free and it beats everything else in performance, I think.

Oct 27 '23 01:10 genevera

Dude this depends on the RAM 8gb u got bad times 128gb the way to go, keep UNET chunks in cache

128GB Mac memory lol. Apples golden money earner. Apple silicon only currently goes up to 96 I believe.

Dec 30 '23 00:12 marshalleq

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Feature Request]: Support for Apple's Core ML Stable Diffusion

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard