mlc-llm Works like a charm !

Just wanted to report that this works perfect on my gtx1060 (6gb) on my old i5-7200 16gb ram under win10. So far, i never reached such a speed with all other existing solution ( oobabooga, textsynth, llama.cpp). No single issue during install. I can't tell exactly but it's surely a couple of tokens / sec during inference. Need more deep dive to get a feeling of quality as it seems to be quantized model in int3 ? Now, we want more : more models, 13b size, parameter access (temp, topp etc) and api. Anyway, i think this is great work already !

Apr 30 '23 10:04 dbddv01

Thank you for sharing the information! We are currently gathering data points on runnable devices and their speed. Would you be willing to assist us in this effort by sharing the tokens/sec data on your GTX 1060?

Apr 30 '23 16:04 junrushao

I have no tools to measure it. So i handle it manually. I prompted 'please produce 500 tokens story of starwars' then copy pasted the text produced to count the tokens using the https://platform.openai.com/tokenizer and my stopwatch. i have following results. 673 tokens in 50 sec : 13,46 t/sec 318 tokens in 25 sec : 12,72 t/sec 256 tokens in 23 sec : 11,13 t/sec

by the way, i noticed : Usage: mlc_chat [--help] [--version] [--device-name VAR] [--artifact-path VAR] [--model VAR] [--dtype VAR] [--params VAR] [--evaluate]

Optional arguments: -h, --help shows help message and exits -v, --version prints version information and exits --device-name [default: "auto"] --artifact-path [default: "dist"] --model [default: "vicuna-v1-7b"] --dtype [default: "auto"] --params [default: "auto"] --evaluate

Is it just a matter of documentation that we would be able to already play with the arguments ?

May 01 '23 05:05 dbddv01

Hey thanks for the data! This is super valuable to us!

We updated mlc_chat_cli this morning to include a command \stats. Would you mind if you updated the conda environment to include this change?

To include this update, you will have to remove the package and install again (conda update doesn't work for some reason):

conda remove mlc-chat-nightly
conda install -c conda-forge -c mlc-ai mlc-chat-nightly

Then the help message will show up when initializing the program, and you may use \stats to get some details:

Thanks a bunch!

May 01 '23 06:05 junrushao

Ok thx ! now with the /stats.

USER: /stats encode: 35.4 tok/s, decode: 16.7 tok/s USER: continue ASSISTANT: In this epic (... removed ...) USER: /stats encode: 11.1 tok/s, decode: 14.0 tok/s USER: continue ASSISTANT: Sure, here's the continuation: (... removed ...) USER: /stats encode: 37.7 tok/s, decode: 17.0 tok/s

May 01 '23 07:05 dbddv01

Thanks a lot for your swift response! The data is super valuable to us!

May 01 '23 07:05 junrushao

mlc-llm mlc-llm copied to clipboard

Works like a charm !

mlc-llm
mlc-llm copied to clipboard