ollama
ollama copied to clipboard
MLX backend
Can ollama be converted to use MLX from Apple as backend for the models ?
This Please!
What do you hope to gain from this? I don't think MLX is faster for inference, at least not yet.
Found these benchmarks: https://medium.com/@andreask_75652/benchmarking-apples-mlx-vs-llama-cpp-bbbebdc18416
Seems like MLX is indeed slower than the llama.cpp masterpiece, at least for now. I did not verify though.
This would be very nice! and not only for text generation, Image/Multimodal would be boosted too.
someone made this https://github.com/kspviswa/PyOMlx
Ollama is awesome and does so many things and some of us want to play with mlx models.