ollama
ollama copied to clipboard
MLX backend
Can ollama be converted to use MLX from Apple as backend for the models ?
This Please!
What do you hope to gain from this? I don't think MLX is faster for inference, at least not yet.
Found these benchmarks: https://medium.com/@andreask_75652/benchmarking-apples-mlx-vs-llama-cpp-bbbebdc18416
Seems like MLX is indeed slower than the llama.cpp masterpiece, at least for now. I did not verify though.
This would be very nice! and not only for text generation, Image/Multimodal would be boosted too.
someone made this https://github.com/kspviswa/PyOMlx
Ollama is awesome and does so many things and some of us want to play with mlx models.
bump
Commenting here to say we're aware of MLX. I've been working on a prototype but I can't give an ETA at for MLX support at this time
Related to this Apple CoreML support to utilize Apple Neural Engine (ANE) alongside GPU & CPU: https://github.com/ollama/ollama/issues/3898