mlx
mlx copied to clipboard
Comparison with llama.cpp and GGML
I wonder how this compares to llama.cpp for example in terms of performance in the same settings?
On my tests GGML gemm is slower. Like ggml ~ 1.5 TFlops, and mlx (quite close to PyTorch) ~ 3.5 TFlops on M1 Pro (32 Gb). Of course llama is not only gemm, but you can estimate