ml-stable-diffusion Prevent compilation every launch in python

I'm looking to optimize the use of ml-stable-diffusion python. When i use python_coreml_stable_diffusion.pipeline, the model is compiled and dont use precompiled .mlmodelc files...

INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.

Why only swift package can use that ??? Is there a solution?

Jul 07 '23 22:07 rbourgeat

Hello @rbourgeat, we have a radar to improve this behavior for Core ML models running with coremltools in Python but it is currently pending prioritization so it wouldn't be immediately available. I recommend:

Initializing the Python pipeline in your own script so the compile-on-load overhead is amortized throughout a session of your script.
Building on top of the Swift package which is more feature-rich (image-to-image, ControlNet etc.) compared to the Python pipeline (only offers text-to-image today)

Jul 08 '23 22:07 atiorh

On a separate note, the concept "compile" is overloaded in this context. .mlpackage files are compiled into .mlmodelc files for deployment which doesn't generally take more than a few seconds. The main overhead comes when the Core ML model is being loaded for a particular compute engine. Currently, it takes ~1-2 minutes for the model to be compiled for the Neural Engine and there is another radar to improve this overall compile time. When loading for GPU and CPU, this overhead should not be more than a few seconds.

Jul 08 '23 22:07 atiorh

Hello @atiorh ! Thank you for your answer.

I tried to initialize the python script directly in my code and the compilation takes 7min at each launch and after generating an image is almost instantaneous (like 2-3s). But I don't want to wait 7min each time...

And for the Swift package, my app is in Python because I'm trying to do cross-platform so it's not an option for me...

So in the meantime I found a solution, it is to use the Pytorch MPS device with Diffuser from HuggingFace:

https://huggingface.co/docs/diffusers/optimization/mps

It does not use quantized Models unfortunately but it is the best solution in Python on M1 in the meantime.

And thank you for all you do, can't wait to see the next updates!

Jul 08 '23 22:07 rbourgeat