mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Question] inference Latency

Open sgSillage opened this issue 2 years ago • 1 comments

❓ General Questions

After building the dolly-v2-3b well, I run the chat.py with the model, but the inference latency is just about tens of minutes. Is that normal? Or just because I set --device cpu.

sgSillage avatar May 30 '23 07:05 sgSillage

More, the answer is not well actually.

sgSillage avatar May 30 '23 08:05 sgSillage