mlc-llm [Question] inference Latency

[Question] inference Latency

Open sgSillage opened this issue 2 years ago • 1 comments

❓ General Questions

After building the dolly-v2-3b well, I run the chat.py with the model, but the inference latency is just about tens of minutes. Is that normal? Or just because I set --device cpu.

May 30 '23 07:05 sgSillage

More, the answer is not well actually.

May 30 '23 08:05 sgSillage

mlc-llm mlc-llm copied to clipboard

[Question] inference Latency

❓ General Questions

mlc-llm
mlc-llm copied to clipboard