mlc-llm
mlc-llm copied to clipboard
Works like a charm !
Just wanted to report that this works perfect on my gtx1060 (6gb) on my old i5-7200 16gb ram under win10. So far, i never reached such a speed with all other existing solution ( oobabooga, textsynth, llama.cpp). No single issue during install. I can't tell exactly but it's surely a couple of tokens / sec during inference. Need more deep dive to get a feeling of quality as it seems to be quantized model in int3 ?
Now, we want more : more models, 13b size, parameter access (temp, topp etc) and api. Anyway, i think this is great work already !

Thank you for sharing the information! We are currently gathering data points on runnable devices and their speed. Would you be willing to assist us in this effort by sharing the tokens/sec data on your GTX 1060?
I have no tools to measure it. So i handle it manually. I prompted 'please produce 500 tokens story of starwars' then copy pasted the text produced to count the tokens using the https://platform.openai.com/tokenizer and my stopwatch. i have following results. 673 tokens in 50 sec : 13,46 t/sec 318 tokens in 25 sec : 12,72 t/sec 256 tokens in 23 sec : 11,13 t/sec
by the way, i noticed : Usage: mlc_chat [--help] [--version] [--device-name VAR] [--artifact-path VAR] [--model VAR] [--dtype VAR] [--params VAR] [--evaluate]
Optional arguments: -h, --help shows help message and exits -v, --version prints version information and exits --device-name [default: "auto"] --artifact-path [default: "dist"] --model [default: "vicuna-v1-7b"] --dtype [default: "auto"] --params [default: "auto"] --evaluate
Is it just a matter of documentation that we would be able to already play with the arguments ?
Hey thanks for the data! This is super valuable to us!
We updated mlc_chat_cli this morning to include a command \stats. Would you mind if you updated the conda environment to include this change?
To include this update, you will have to remove the package and install again (conda update doesn't work for some reason):
conda remove mlc-chat-nightly
conda install -c conda-forge -c mlc-ai mlc-chat-nightly
Then the help message will show up when initializing the program, and you may use \stats to get some details:

Thanks a bunch!
Ok thx ! now with the /stats.
USER: /stats encode: 35.4 tok/s, decode: 16.7 tok/s USER: continue ASSISTANT: In this epic (... removed ...) USER: /stats encode: 11.1 tok/s, decode: 14.0 tok/s USER: continue ASSISTANT: Sure, here's the continuation: (... removed ...) USER: /stats encode: 37.7 tok/s, decode: 17.0 tok/s
Thanks a lot for your swift response! The data is super valuable to us!