diffusers
diffusers copied to clipboard
apple: support for MLX quantized linear in diffusers
Is your feature request related to a problem? Please describe.
As an Apple MPS user, it always feels somewhat like we're second-class citizens with respect to the latest and greatest optimisations that only happen for other platforms. The biggest deal is likely xformers/bitsandbytes which remain CUDA-only, but the outcome is more important than the codepath used to get there.
Describe the solution you'd like.
I've discovered Apple has some MLX examples for T2I inference on SDXL and other SD models that allow AoT quantization of the unet and text encoders.
Describe alternatives you've considered.
There is metal-flash-attention but it would require writing integrating custom Metal kernels, which feels out of scope for Diffusers.
We also have a couple forks of bitsandbytes which aim to improve portability, but there's nothing actionable yet.
Additional context.
I haven't tried to implement it yet, it would probably require a bit of monkeying around.
cc @pcuenca here
Diffusers MLX back-end? 👀
That would be very cool
yes especially with larger models and apple themselves kinda dropping the ball repeatedly over multiple years on pytorch support. MLX is their bread and butter. it's so good that it should probably even be used by default on MPS instead of pytorch.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.