Comparison with llama.cpp and GGML

Open staghado opened this issue 2 years ago • 1 comments

I wonder how this compares to llama.cpp for example in terms of performance in the same settings?

Dec 06 '23 11:12 staghado

On my tests GGML gemm is slower. Like ggml ~ 1.5 TFlops, and mlx (quite close to PyTorch) ~ 3.5 TFlops on M1 Pro (32 Gb). Of course llama is not only gemm, but you can estimate

Dec 25 '23 23:12 cyrusmsk