mlx-llm-server
mlx-llm-server copied to clipboard
Benchmarks?
Do you have some benchmarks against llama.cpp?
It will be slower than llama.cpp, given that MLX is a general machine learning framework and not specialized for LLM inference. However, MLX is actively working on improving performance; I believe it will improve significantly in the future.