Manu Maheshwari
Manu Maheshwari
## ❓ General Questions We have similar script for llama.cpp - ./llama-bench. It is really useful if one wants to suggest optimizations and review performance across different devices.
## ❓ General Questions Why is the context-phase performance of mlc-llm so bad? It takes around 0.54s for the context phase on AMD 7900 XTX, which llama.cpp doing the same...
Has anybody benchmarked attention performance of MLC-LLM on AMD hardware with what best is available out there?