mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

Speed benchmark compare with llama.cpp

Open luohao123 opened this issue 2 years ago • 1 comments

hello ,does there any speed throughout benchmark comparing with llama.cpp?

luohao123 avatar May 04 '23 02:05 luohao123

The technical path we are using are quite different from Llama.cpp. MLC LLM primarily uses a compiler to generate efficient code targeting multiple CPU/GPU vendors, while Llama.cpp focuses on handcrafting. It is certainly possible to compare performance, but I personally prefer that it's a less prioritized item for us, because GPU is supposed to be way faster than CPUs for deep learning workloads.

junrushao avatar May 08 '23 22:05 junrushao

Great question. @junrushao What about DSP support on Mobile devices? I heard it's possible to hook up DSP onto TVM.

@luohao123 do you know if llama.cpp plans to add support for DSP chips of high-end cellphones?

escorciav avatar Jun 13 '23 15:06 escorciav

when GPTQ support is merged, it would be interesting to compare with https://github.com/turboderp/exllama

taowen avatar Jun 18 '23 16:06 taowen