mlc-llm
mlc-llm copied to clipboard
[Question] inference Latency
❓ General Questions
After building the dolly-v2-3b well, I run the chat.py with the model, but the inference latency is just about tens of minutes. Is that normal? Or just because I set --device cpu.
More, the answer is not well actually.