mlc-llm
mlc-llm copied to clipboard
Speed benchmark compare with llama.cpp
hello ,does there any speed throughout benchmark comparing with llama.cpp?
The technical path we are using are quite different from Llama.cpp. MLC LLM primarily uses a compiler to generate efficient code targeting multiple CPU/GPU vendors, while Llama.cpp focuses on handcrafting. It is certainly possible to compare performance, but I personally prefer that it's a less prioritized item for us, because GPU is supposed to be way faster than CPUs for deep learning workloads.
Great question. @junrushao What about DSP support on Mobile devices? I heard it's possible to hook up DSP onto TVM.
@luohao123 do you know if llama.cpp plans to add support for DSP chips of high-end cellphones?
when GPTQ support is merged, it would be interesting to compare with https://github.com/turboderp/exllama